由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Programming版 - 一道c++ 题, 找出duplicate numbers
相关主题
An algorithm question[合集] how to remove duplicates from linked list?
问个构造函数的问题验证一个问题的答案
又一道面试题,我是不是想多了?read a file and check if there's identical entry
算法问题求教:字符串比较c++问题
some interview questions i met and rememberedC++构造函数的问题
C# HtmlElement.InvokeMember at Amazon.comreturn value of a python function...
请问这道题怎么解决?一FG家常见题 (转载)
Re: amazon onsite interview question (转载)一个 C++ STL base type 的问题
相关话题的讨论汇总
话题: array话题: numbers话题: each话题: hash话题: duplicate
进入Programming版参与讨论
1 (共1页)
s*********e
发帖数: 17
1
You are given an array of 16 bit numbers.
The array may contain few duplicates not to exceed 0.1%
of array size. Each duplicated number can occur multiple tiimes.
The goal is to write high performance function to find out those
duplicated numbers, and how many times each number occurs.
m***t
发帖数: 254
2
hash table?

【在 s*********e 的大作中提到】
: You are given an array of 16 bit numbers.
: The array may contain few duplicates not to exceed 0.1%
: of array size. Each duplicated number can occur multiple tiimes.
: The goal is to write high performance function to find out those
: duplicated numbers, and how many times each number occurs.

c********e
发帖数: 383
3
16 bits so there are 2^16 possible numbers, pure hash will be too space
ineffective.
hash on the first 8 bits 256 basks, then do binary search on the lower
8 bits.

【在 m***t 的大作中提到】
: hash table?
c********e
发帖数: 383
4
well, i guess this really depends on the actual array size. if the size of
the
array is close to 2^16 then a pure hash will be it since there is very few
dups.

【在 c********e 的大作中提到】
: 16 bits so there are 2^16 possible numbers, pure hash will be too space
: ineffective.
: hash on the first 8 bits 256 basks, then do binary search on the lower
: 8 bits.

j*****h
发帖数: 62
5
you can also use a bitmap (8K bytes) to detect duplicates,
and use a hashtable to store the counters for each
duplicate. Since only there're only 0.1% duplicates, the hash
table should be very small.

【在 c********e 的大作中提到】
: well, i guess this really depends on the actual array size. if the size of
: the
: array is close to 2^16 then a pure hash will be it since there is very few
: dups.

m******k
发帖数: 43
6
why not directly use bitmap. Since duplicate rate <0.1%. So the maximium
duplicate occurrences will be less than 64K * 0.001 <=66. So for each number
in bitmap, I would use 7 bit to represent. So total memory consuption is
7bits * 64K =56KB. Then I can directly count the number of duplicates in
this bitmaps
In the other side, I just wonder if you use an extra Hashtable, How many
slots are you gonna use? I think it's O(n) where n is the size of the array.

you can also use a bitmap (8K bytes) to

【在 j*****h 的大作中提到】
: you can also use a bitmap (8K bytes) to detect duplicates,
: and use a hashtable to store the counters for each
: duplicate. Since only there're only 0.1% duplicates, the hash
: table should be very small.

m******k
发帖数: 43
7
for the binary search method, are you chaining an binary tree at each slot?

16 bits so there are 2^16 possible numbers, pure hash will be too space
ineffective.
hash on the first 8 bits 256 basks, then do binary search on the lower
8 bits.

【在 c********e 的大作中提到】
: 16 bits so there are 2^16 possible numbers, pure hash will be too space
: ineffective.
: hash on the first 8 bits 256 basks, then do binary search on the lower
: 8 bits.

c********e
发帖数: 383
8
when u insert the number into each basket, contruct a b-tree

?

【在 m******k 的大作中提到】
: for the binary search method, are you chaining an binary tree at each slot?
:
: 16 bits so there are 2^16 possible numbers, pure hash will be too space
: ineffective.
: hash on the first 8 bits 256 basks, then do binary search on the lower
: 8 bits.

t****t
发帖数: 6806
9
7bit还不如8bit呢.直接变成hash了.

number
array.

【在 m******k 的大作中提到】
: why not directly use bitmap. Since duplicate rate <0.1%. So the maximium
: duplicate occurrences will be less than 64K * 0.001 <=66. So for each number
: in bitmap, I would use 7 bit to represent. So total memory consuption is
: 7bits * 64K =56KB. Then I can directly count the number of duplicates in
: this bitmaps
: In the other side, I just wonder if you use an extra Hashtable, How many
: slots are you gonna use? I think it's O(n) where n is the size of the array.
:
: you can also use a bitmap (8K bytes) to

t****t
发帖数: 6806
10
搞那么复杂...弄个64K的表格有啥不好的?也没浪费多少空间吧.

【在 c********e 的大作中提到】
: when u insert the number into each basket, contruct a b-tree
:
: ?

c********e
发帖数: 383
11
en, bt neh

【在 t****t 的大作中提到】
: 搞那么复杂...弄个64K的表格有啥不好的?也没浪费多少空间吧.
1 (共1页)
进入Programming版参与讨论
相关主题
一个 C++ STL base type 的问题some interview questions i met and remembered
Java 问题,请教如何找出一个array里的duplicate segments?C# HtmlElement.InvokeMember at Amazon.com
有没工具或framework可以对大数据库运行中去重复?请问这道题怎么解决?
怎么防止duplicate vote 比较好?Re: amazon onsite interview question (转载)
An algorithm question[合集] how to remove duplicates from linked list?
问个构造函数的问题验证一个问题的答案
又一道面试题,我是不是想多了?read a file and check if there's identical entry
算法问题求教:字符串比较c++问题
相关话题的讨论汇总
话题: array话题: numbers话题: each话题: hash话题: duplicate