c**********e 发帖数: 2007 | 1 如果不重复的记录就不要了。只要出现过两次或者两次以上的记录。每个重复记录只要
一次就行了。 |
g***l 发帖数: 18555 | 2 这个问题我面试的时候问过无数次了
DISTINCT去除DUPLICATE RECORD
GROUP BY 去除DUPLICATE KEY
你要这个都不会的话,基本的SQL不懂 |
y****9 发帖数: 144 | 3 I guess you imply that duplicate means every columns in two rows are same. I
have a need to delete duplicate rows before, where duplicate means only
unique key columns are same. (see: http://oracle-study-notes.blogspot.com/2008/06/resolve-issue-of-duplicated-rows-in.html )
Here is a good post for Oracle, but should be applicable for SQL Server too
http://viralpatel.net/blogs/2010/06/deleting-duplicate-rows-in-
【在 c**********e 的大作中提到】 : 如果不重复的记录就不要了。只要出现过两次或者两次以上的记录。每个重复记录只要 : 一次就行了。
|
p*********t 发帖数: 2690 | 4 lz的问题是怎么找到,不是怎么去掉.
【在 g***l 的大作中提到】 : 这个问题我面试的时候问过无数次了 : DISTINCT去除DUPLICATE RECORD : GROUP BY 去除DUPLICATE KEY : 你要这个都不会的话,基本的SQL不懂
|
c**********e 发帖数: 2007 | 5 My input data set:
AAA 4 5
AAA 4 5
AAA 4 4
AAA 5 5
BBB 4 5
Need to output:
AAA 4 5
This is the only duplicate record!
【在 g***l 的大作中提到】 : 这个问题我面试的时候问过无数次了 : DISTINCT去除DUPLICATE RECORD : GROUP BY 去除DUPLICATE KEY : 你要这个都不会的话,基本的SQL不懂
|
g***l 发帖数: 18555 | 6 "重复记录只要一次就行了",哪里说找到了?
【在 p*********t 的大作中提到】 : lz的问题是怎么找到,不是怎么去掉.
|
p*********t 发帖数: 2690 | 7 "How to find all duplicate record in SQL?"
【在 g***l 的大作中提到】 : "重复记录只要一次就行了",哪里说找到了?
|
l******8 发帖数: 9475 | |
w**********n 发帖数: 3624 | 9 group by having count(sth)>1 |
m*********y 发帖数: 389 | 10 Let's say you have first name, last name, and address in one table.
Here duplicate means same firstname&lastname&address combination, for
instance:
John|Doe|123 Main st|...|...
John|Doe|123 Main st|...|...
Mike|Smith|567 Springfield ave|...|...
to find duplicate, do this:
select * from dbo.person where firstname+lastname+address
in (select firstname+lastname+address from dbo.person
group by firstname+lastname+address having count(*)>1)
Does this make sense? |
|
|
r*****l 发帖数: 2859 | 11 What is the meaning or "+" in "firstname+lastname+address". I am not a SQL
expert so forgive my simple question :)
【在 m*********y 的大作中提到】 : Let's say you have first name, last name, and address in one table. : Here duplicate means same firstname&lastname&address combination, for : instance: : John|Doe|123 Main st|...|... : John|Doe|123 Main st|...|... : Mike|Smith|567 Springfield ave|...|... : to find duplicate, do this: : select * from dbo.person where firstname+lastname+address : in (select firstname+lastname+address from dbo.person : group by firstname+lastname+address having count(*)>1)
|
m*********y 发帖数: 389 | 12 + is concatenation ; we use that a lot to find out and furthuer more, to
remove duplicate record. But, first thing first, how do you define your
duplicate is up to you.
you can use the Plus sign to concatenate string together...
Usually a table contains many, if not thousands of same first name, so, only
the same firstname&lastname&address combination together truly counts as a
duplicate.. it depends on how you define *duplicate* in your own table.. |
a9 发帖数: 21638 | 13 这个执行起来估计很慢。
记得前阵子有人发了个hash的函数还是啥来着,有谁还记得?
【在 m*********y 的大作中提到】 : Let's say you have first name, last name, and address in one table. : Here duplicate means same firstname&lastname&address combination, for : instance: : John|Doe|123 Main st|...|... : John|Doe|123 Main st|...|... : Mike|Smith|567 Springfield ave|...|... : to find duplicate, do this: : select * from dbo.person where firstname+lastname+address : in (select firstname+lastname+address from dbo.person : group by firstname+lastname+address having count(*)>1)
|
r*****l 发帖数: 2859 | 14 What if there are two records:
John|Doe|123 Main st|...|...
John|Doe1|23 Main st|...|...
I believe Oracle has something like "group by col1, col2, etc". Why don't
you use that instead of concat?
only
a
【在 m*********y 的大作中提到】 : + is concatenation ; we use that a lot to find out and furthuer more, to : remove duplicate record. But, first thing first, how do you define your : duplicate is up to you. : you can use the Plus sign to concatenate string together... : Usually a table contains many, if not thousands of same first name, so, only : the same firstname&lastname&address combination together truly counts as a : duplicate.. it depends on how you define *duplicate* in your own table..
|
r*****l 发帖数: 2859 | 15 En. If finding duplicate is a business need, then the application side can calculate the hash.
Otherwise, I bet Oracle has some hash functions.
【在 a9 的大作中提到】 : 这个执行起来估计很慢。 : 记得前阵子有人发了个hash的函数还是啥来着,有谁还记得?
|
m*********y 发帖数: 389 | 16 How to add a hash? I used the Checksum function, but I heard that HashType
is more powerful than Checksum. Anyway, let's say you want hash on the the
combination of first name, last name and address, you can do this:
Alter table person
Add hash as checksum(firstname,lastname,address)
After you created hash, searching for duplicate record is much easier, you
just need to run below query:
select hash,count(*) from person group by hash having count(*)>1 |
B*****g 发帖数: 34098 | 17 比如你有100cloumn,而且有些column size很大。如果你每一个新的record都和旧的
record比较所有的cloumn,速度会很慢,而且一般没有有效的index可用。
我认为hash的意思是,每一个record生成一个hash value存在一个column(with index
),然后每一新的record都先比较一下hash,这个应该很快。hash不一样的record肯定
不一样。hash一样的在比较一下每个column看一下是否真的一样(这部分的很小了)
具体问题请问zenny,不过她可不便宜。
【在 m*********y 的大作中提到】 : How to add a hash? I used the Checksum function, but I heard that HashType : is more powerful than Checksum. Anyway, let's say you want hash on the the : combination of first name, last name and address, you can do this: : Alter table person : Add hash as checksum(firstname,lastname,address) : After you created hash, searching for duplicate record is much easier, you : just need to run below query: : select hash,count(*) from person group by hash having count(*)>1
|
s*****o 发帖数: 303 | 18 记得以前在stackoverflow上答过,在sql servers 上用 row number 和 partiton by |
i*****9 发帖数: 293 | 19 modify the sql to suit your need.
delete from $table_name where rowid in
(
select "rowid" from
(select "rowid", rank_n from
(select rank() over (partition by $primary_key order by rowid) rank
_n, rowid as "rowid"
from $table_name
where $primary_key in
(select $primary_key from $table_name
group by $all_columns
having count(*) > 1
)
)
)
where rank_n > 1
)
【在 c**********e 的大作中提到】 : 如果不重复的记录就不要了。只要出现过两次或者两次以上的记录。每个重复记录只要 : 一次就行了。
|
L******4 发帖数: 1778 | 20 这个可以
但是会影响效率吧
如果是程式调用
是不是该尽量避免这样的语句呢
【在 m*********y 的大作中提到】 : Let's say you have first name, last name, and address in one table. : Here duplicate means same firstname&lastname&address combination, for : instance: : John|Doe|123 Main st|...|... : John|Doe|123 Main st|...|... : Mike|Smith|567 Springfield ave|...|... : to find duplicate, do this: : select * from dbo.person where firstname+lastname+address : in (select firstname+lastname+address from dbo.person : group by firstname+lastname+address having count(*)>1)
|