s*********e 发帖数: 17 | 1 You are given an array of 16 bit numbers.
The array may contain few duplicates not to exceed 0.1%
of array size. Each duplicated number can occur multiple tiimes.
The goal is to write high performance function to find out those
duplicated numbers, and how many times each number occurs. |
m***t 发帖数: 254 | 2 hash table?
【在 s*********e 的大作中提到】 : You are given an array of 16 bit numbers. : The array may contain few duplicates not to exceed 0.1% : of array size. Each duplicated number can occur multiple tiimes. : The goal is to write high performance function to find out those : duplicated numbers, and how many times each number occurs.
|
c********e 发帖数: 383 | 3 16 bits so there are 2^16 possible numbers, pure hash will be too space
ineffective.
hash on the first 8 bits 256 basks, then do binary search on the lower
8 bits.
【在 m***t 的大作中提到】 : hash table?
|
c********e 发帖数: 383 | 4 well, i guess this really depends on the actual array size. if the size of
the
array is close to 2^16 then a pure hash will be it since there is very few
dups.
【在 c********e 的大作中提到】 : 16 bits so there are 2^16 possible numbers, pure hash will be too space : ineffective. : hash on the first 8 bits 256 basks, then do binary search on the lower : 8 bits.
|
j*****h 发帖数: 62 | 5 you can also use a bitmap (8K bytes) to detect duplicates,
and use a hashtable to store the counters for each
duplicate. Since only there're only 0.1% duplicates, the hash
table should be very small.
【在 c********e 的大作中提到】 : well, i guess this really depends on the actual array size. if the size of : the : array is close to 2^16 then a pure hash will be it since there is very few : dups.
|
m******k 发帖数: 43 | 6 why not directly use bitmap. Since duplicate rate <0.1%. So the maximium
duplicate occurrences will be less than 64K * 0.001 <=66. So for each number
in bitmap, I would use 7 bit to represent. So total memory consuption is
7bits * 64K =56KB. Then I can directly count the number of duplicates in
this bitmaps
In the other side, I just wonder if you use an extra Hashtable, How many
slots are you gonna use? I think it's O(n) where n is the size of the array.
you can also use a bitmap (8K bytes) to
【在 j*****h 的大作中提到】 : you can also use a bitmap (8K bytes) to detect duplicates, : and use a hashtable to store the counters for each : duplicate. Since only there're only 0.1% duplicates, the hash : table should be very small.
|
m******k 发帖数: 43 | 7 for the binary search method, are you chaining an binary tree at each slot?
16 bits so there are 2^16 possible numbers, pure hash will be too space
ineffective.
hash on the first 8 bits 256 basks, then do binary search on the lower
8 bits.
【在 c********e 的大作中提到】 : 16 bits so there are 2^16 possible numbers, pure hash will be too space : ineffective. : hash on the first 8 bits 256 basks, then do binary search on the lower : 8 bits.
|
c********e 发帖数: 383 | 8 when u insert the number into each basket, contruct a b-tree
?
【在 m******k 的大作中提到】 : for the binary search method, are you chaining an binary tree at each slot? : : 16 bits so there are 2^16 possible numbers, pure hash will be too space : ineffective. : hash on the first 8 bits 256 basks, then do binary search on the lower : 8 bits.
|
t****t 发帖数: 6806 | 9 7bit还不如8bit呢.直接变成hash了.
number
array.
【在 m******k 的大作中提到】 : why not directly use bitmap. Since duplicate rate <0.1%. So the maximium : duplicate occurrences will be less than 64K * 0.001 <=66. So for each number : in bitmap, I would use 7 bit to represent. So total memory consuption is : 7bits * 64K =56KB. Then I can directly count the number of duplicates in : this bitmaps : In the other side, I just wonder if you use an extra Hashtable, How many : slots are you gonna use? I think it's O(n) where n is the size of the array. : : you can also use a bitmap (8K bytes) to
|
t****t 发帖数: 6806 | 10 搞那么复杂...弄个64K的表格有啥不好的?也没浪费多少空间吧.
【在 c********e 的大作中提到】 : when u insert the number into each basket, contruct a b-tree : : ?
|
c********e 发帖数: 383 | 11 en, bt neh
【在 t****t 的大作中提到】 : 搞那么复杂...弄个64K的表格有啥不好的?也没浪费多少空间吧.
|