How to find all duplicate record in SQL? - Database版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Database版 - How to find all duplicate record in SQL?

相关主题
● 请教一下高手，包子答谢！	● 问一个小问题！
● list duplicators in one table	● 急问：导入XML文件到MySQL数据库
● 求教高手：odbc更新数据出错	● Load .cvs to database
● SQL question(在线等，谢谢！！）	● 请教关于exists在sql server中的用法
● SQL的一个愚蠢小问题	● db2 default password
● 急问：怎么去掉MS SQL数据库中的重复记录？	● 一个SQL问题
● 怎么去除duplicates	● 请教一下这个report的query应该怎么样写?
● A question about recursive query	● 我经常问的几道SQL SERVER DBA的面试题，图省事不问编程

相关话题的讨论汇总
话题: duplicate话题: sql话题: record话题: rowid话题: hash

进入Database版参与讨论

(共1页)

c**********e
发帖数: 2007

如果不重复的记录就不要了。只要出现过两次或者两次以上的记录。每个重复记录只要
一次就行了。

g***l
发帖数: 18555

这个问题我面试的时候问过无数次了
DISTINCT去除DUPLICATE RECORD
GROUP BY 去除DUPLICATE KEY
你要这个都不会的话,基本的SQL不懂

y****9
发帖数: 144

I guess you imply that duplicate means every columns in two rows are same. I
have a need to delete duplicate rows before, where duplicate means only
unique key columns are same. (see: http://oracle-study-notes.blogspot.com/2008/06/resolve-issue-of-duplicated-rows-in.html )
Here is a good post for Oracle, but should be applicable for SQL Server too
http://viralpatel.net/blogs/2010/06/deleting-duplicate-rows-in-

【在 c**********e 的大作中提到】

: 如果不重复的记录就不要了。只要出现过两次或者两次以上的记录。每个重复记录只要
: 一次就行了。

p*********t
发帖数: 2690

lz的问题是怎么找到,不是怎么去掉.

【在 g***l 的大作中提到】

: 这个问题我面试的时候问过无数次了
: DISTINCT去除DUPLICATE RECORD
: GROUP BY 去除DUPLICATE KEY
: 你要这个都不会的话,基本的SQL不懂

c**********e
发帖数: 2007

My input data set:
AAA 4 5
AAA 4 5
AAA 4 4
AAA 5 5
BBB 4 5
Need to output:
AAA 4 5
This is the only duplicate record!

【在 g***l 的大作中提到】

: 这个问题我面试的时候问过无数次了
: DISTINCT去除DUPLICATE RECORD
: GROUP BY 去除DUPLICATE KEY
: 你要这个都不会的话,基本的SQL不懂

g***l
发帖数: 18555

"重复记录只要一次就行了",哪里说找到了？

【在 p*********t 的大作中提到】

: lz的问题是怎么找到,不是怎么去掉.

p*********t
发帖数: 2690

"How to find all duplicate record in SQL?"

【在 g***l 的大作中提到】

: "重复记录只要一次就行了",哪里说找到了？

l******8
发帖数: 9475

....
have xxxx > 1

w**********n
发帖数: 3624

group by having count(sth)>1

m*********y
发帖数: 389

相关主题
● 急问：怎么去掉MS SQL数据库中的重复记录？	● 问一个小问题！
● 怎么去除duplicates	● 急问：导入XML文件到MySQL数据库
● A question about recursive query	● Load .cvs to database
进入Database版参与讨论

r*****l
发帖数: 2859

What is the meaning or "+" in "firstname+lastname+address". I am not a SQL
expert so forgive my simple question :)

【在 m*********y 的大作中提到】

m*********y
发帖数: 389

+ is concatenation ; we use that a lot to find out and furthuer more, to
remove duplicate record. But, first thing first, how do you define your
duplicate is up to you.
you can use the Plus sign to concatenate string together...
Usually a table contains many, if not thousands of same first name, so, only
the same firstname&lastname&address combination together truly counts as a
duplicate.. it depends on how you define *duplicate* in your own table..

a9
发帖数: 21638

这个执行起来估计很慢。
记得前阵子有人发了个hash的函数还是啥来着，有谁还记得？

【在 m*********y 的大作中提到】

r*****l
发帖数: 2859

What if there are two records:
John|Doe|123 Main st|...|...
John|Doe1|23 Main st|...|...
I believe Oracle has something like "group by col1, col2, etc". Why don't
you use that instead of concat?

only
a

【在 m*********y 的大作中提到】

: + is concatenation ; we use that a lot to find out and furthuer more, to
: remove duplicate record. But, first thing first, how do you define your
: duplicate is up to you.
: you can use the Plus sign to concatenate string together...
: Usually a table contains many, if not thousands of same first name, so, only
: the same firstname&lastname&address combination together truly counts as a
: duplicate.. it depends on how you define *duplicate* in your own table..

r*****l
发帖数: 2859

En. If finding duplicate is a business need, then the application side can calculate the hash.
Otherwise, I bet Oracle has some hash functions.

【在 a9 的大作中提到】

: 这个执行起来估计很慢。
: 记得前阵子有人发了个hash的函数还是啥来着，有谁还记得？

m*********y
发帖数: 389

How to add a hash? I used the Checksum function, but I heard that HashType
is more powerful than Checksum. Anyway, let's say you want hash on the the
combination of first name, last name and address, you can do this:
Alter table person
Add hash as checksum(firstname,lastname,address)
After you created hash, searching for duplicate record is much easier, you
just need to run below query:
select hash,count(*) from person group by hash having count(*)>1

B*****g
发帖数: 34098

比如你有100cloumn，而且有些column size很大。如果你每一个新的record都和旧的
record比较所有的cloumn，速度会很慢，而且一般没有有效的index可用。
我认为hash的意思是，每一个record生成一个hash value存在一个column（with index
），然后每一新的record都先比较一下hash，这个应该很快。hash不一样的record肯定
不一样。hash一样的在比较一下每个column看一下是否真的一样（这部分的很小了）
具体问题请问zenny，不过她可不便宜。

【在 m*********y 的大作中提到】

: How to add a hash? I used the Checksum function, but I heard that HashType
: is more powerful than Checksum. Anyway, let's say you want hash on the the
: combination of first name, last name and address, you can do this:
: Alter table person
: Add hash as checksum(firstname,lastname,address)
: After you created hash, searching for duplicate record is much easier, you
: just need to run below query:
: select hash,count(*) from person group by hash having count(*)>1

s*****o
发帖数: 303

记得以前在stackoverflow上答过，在sql servers 上用 row number 和 partiton by

i*****9
发帖数: 293

modify the sql to suit your need.
delete from $table_name where rowid in
(
select "rowid" from
(select "rowid", rank_n from
(select rank() over (partition by $primary_key order by rowid) rank
_n, rowid as "rowid"
from $table_name
where $primary_key in
(select $primary_key from $table_name
group by $all_columns
having count(*) > 1
)
)
)
where rank_n > 1
)

【在 c**********e 的大作中提到】

: 如果不重复的记录就不要了。只要出现过两次或者两次以上的记录。每个重复记录只要
: 一次就行了。

L******4
发帖数: 1778

这个可以
但是会影响效率吧
如果是程式调用
是不是该尽量避免这样的语句呢

【在 m*********y 的大作中提到】

(共1页)

进入Database版参与讨论

相关主题
● 我经常问的几道SQL SERVER DBA的面试题，图省事不问编程	● SQL的一个愚蠢小问题
● 这个 Oracle SQL 语句该这么写啊？	● 急问：怎么去掉MS SQL数据库中的重复记录？
● 求问 Oracle Materialized View	● 怎么去除duplicates
● 请教比较两个table，找出相同和不同的records	● A question about recursive query
● 请教一下高手，包子答谢！	● 问一个小问题！
● list duplicators in one table	● 急问：导入XML文件到MySQL数据库
● 求教高手：odbc更新数据出错	● Load .cvs to database
● SQL question(在线等，谢谢！！）	● 请教关于exists在sql server中的用法

相关话题的讨论汇总
话题: duplicate话题: sql话题: record话题: rowid话题: hash

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天