c******n 发帖数: 4965 | 1 常见的。 为严谨起见我逐字抄下来
given a sequential file that contains ____ at most four billion 32-bit
integers -_____ in random order, find a 32-bit integer that isn't in the
file
简单的办法,如果有4G内存, 就做一个bitmap 可以找到。 书里讨论如果只有几百
byte内存, 但可以写文件, 怎么弄。
下面剧透。
书上说,把要找的range 分两半 , scan 过整个文件,看每一个range 里有多少, 肯
定有一个range 不够满,那就继续找那个range, 这样range 指数变小,最后为size 1,
就是要找的missing number.
我开始理解错,以为是有N numbers,N>2^32, but only 2^32-m of them are
distinct, where 1<=m < 2^32. 这样的话对每个小range 计数就不能认定是否有
missing number. 比如
input is
[0, 1 , 3 ]
(assume input range is 2^2 instead of 2^32 )
最开始range 是0--3
先在 0--1 range 找, 有两个数 (0,1), 再去2--3 找, 有一个数, range 就变
为2--3
然后找到2 missing.
如果是说条件只是限制unique numbers, 总数不管, 那sample input 变成
[0,1,3,3]
这样两个sub range 都有足够的数,就不行了。
用前面bitmap , 新的条件也可以解, 但binary search 的思路办法似乎就不成了。 | h**********c 发帖数: 4120 | | h**********c 发帖数: 4120 | 3 n! mod \bigpi a_i equal to the missed number.
n! needs nlogn bits | c******n 发帖数: 4965 | 4 first of all mul and div are too slow
even if u agree to use mul and div, this method is still only as good as the
one given in the book, and can't handle duplicate numbers , which is the
further case I raised
【在 h**********c 的大作中提到】 : n! mod \bigpi a_i equal to the missed number. : n! needs nlogn bits
| c******n 发帖数: 4965 | 5 more details please ?
【在 h**********c 的大作中提到】 : 算连续数的sha1,n^2
| h**********c 发帖数: 4120 | 6 还没想清楚,先站个坑
不过假设你只有4k内寸,可以先捞1~k
没有missed,再捞k+1~8K
【在 c******n 的大作中提到】 : more details please ?
| c******n 发帖数: 4965 | 7
【在 h**********c 的大作中提到】 : 还没想清楚,先站个坑 : 不过假设你只有4k内寸,可以先捞1~k : 没有missed,再捞k+1~8K
| c******n 发帖数: 4965 | 8 就思路来说,这个办法肯定可以在面试上加分, 基本上就是“combine/hybrid
existing 2 approaches", 但实际不work , 4B-->4K you have to chop the input
file into 1mil small files and do in-memory search for 1mil times.
【在 h**********c 的大作中提到】 : 还没想清楚,先站个坑 : 不过假设你只有4k内寸,可以先捞1~k : 没有missed,再捞k+1~8K
| c******n 发帖数: 4965 | 9 常见的。 为严谨起见我逐字抄下来
given a sequential file that contains ____ at most four billion 32-bit
integers -_____ in random order, find a 32-bit integer that isn't in the
file
简单的办法,如果有4G内存, 就做一个bitmap 可以找到。 书里讨论如果只有几百
byte内存, 但可以写文件, 怎么弄。
下面剧透。
书上说,把要找的range 分两半 , scan 过整个文件,看每一个range 里有多少, 肯
定有一个range 不够满,那就继续找那个range, 这样range 指数变小,最后为size 1,
就是要找的missing number.
我开始理解错,以为是有N numbers,N>2^32, but only 2^32-m of them are
distinct, where 1<=m < 2^32. 这样的话对每个小range 计数就不能认定是否有
missing number. 比如
input is
[0, 1 , 3 ]
(assume input range is 2^2 instead of 2^32 )
最开始range 是0--3
先在 0--1 range 找, 有两个数 (0,1), 再去2--3 找, 有一个数, range 就变
为2--3
然后找到2 missing.
如果是说条件只是限制unique numbers, 总数不管, 那sample input 变成
[0,1,3,3]
这样两个sub range 都有足够的数,就不行了。
用前面bitmap , 新的条件也可以解, 但binary search 的思路办法似乎就不成了。 | h**********c 发帖数: 4120 | | | | h**********c 发帖数: 4120 | 11 n! mod \bigpi a_i equal to the missed number.
n! needs nlogn bits | c******n 发帖数: 4965 | 12 first of all mul and div are too slow
even if u agree to use mul and div, this method is still only as good as the
one given in the book, and can't handle duplicate numbers , which is the
further case I raised
【在 h**********c 的大作中提到】 : n! mod \bigpi a_i equal to the missed number. : n! needs nlogn bits
| c******n 发帖数: 4965 | 13 more details please ?
【在 h**********c 的大作中提到】 : 算连续数的sha1,n^2
| h**********c 发帖数: 4120 | 14 还没想清楚,先站个坑
不过假设你只有4k内寸,可以先捞1~k
没有missed,再捞k+1~8K
【在 c******n 的大作中提到】 : more details please ?
| c******n 发帖数: 4965 | 15
【在 h**********c 的大作中提到】 : 还没想清楚,先站个坑 : 不过假设你只有4k内寸,可以先捞1~k : 没有missed,再捞k+1~8K
| c******n 发帖数: 4965 | 16 就思路来说,这个办法肯定可以在面试上加分, 基本上就是“combine/hybrid
existing 2 approaches", 但实际不work , 4B-->4K you have to chop the input
file into 1mil small files and do in-memory search for 1mil times.
【在 h**********c 的大作中提到】 : 还没想清楚,先站个坑 : 不过假设你只有4k内寸,可以先捞1~k : 没有missed,再捞k+1~8K
| h**********c 发帖数: 4120 | 17 Took some time think over this and I made an assumption, suppose there is no
repeating numbers. So we want to find the single missing number ranging
from 0 to 2^32-1 unsinged integers in a random sequence.
First, need background of unique factorial theorem.
Second, search prime numbers 32 bit in Internet, you will find we have total
about 200,xxx,xxx prime numbers in this scope.
Here is the deal.
suppose we have a_1 ... a_m prime numbers. a_1*...*a_m is the ceiling of 2^
32 -1 with any extra prime number needed. I think the count of a_1 to a_m
will be far less than 200,xxx,xxx. Think over it?
So put a_1 and a_m in a hashmap. The value will be the count of given prime
number.
From 0 to 2^32-1 - 1, the hashmap above will have a unique sequence.
Then we process the input, factor each number in the input data. Count the
presence of each prime number (add to the according value in the hashmap).
We will detect the prime numbers fail to present, compare to an total
sequence of 0 to 2^32-1. Their multiple is the single missed numbers.
If there are multiple missed numbers, we can use a second pass, use the
different combinations of the missed primes, screening the input data again.
So it will be good the OS has a local prime table.
Just for discussion for potential improvement of the situation. Please feel
free to comment. Thank off irrelevants!
MIT License.
1,
【在 c******n 的大作中提到】 : 常见的。 为严谨起见我逐字抄下来 : given a sequential file that contains ____ at most four billion 32-bit : integers -_____ in random order, find a 32-bit integer that isn't in the : file : 简单的办法,如果有4G内存, 就做一个bitmap 可以找到。 书里讨论如果只有几百 : byte内存, 但可以写文件, 怎么弄。 : 下面剧透。 : 书上说,把要找的range 分两半 , scan 过整个文件,看每一个range 里有多少, 肯 : 定有一个range 不够满,那就继续找那个range, 这样range 指数变小,最后为size 1, : 就是要找的missing number.
| j**********3 发帖数: 3211 | | C****t 发帖数: 53 | 19 Probably we can do in this way.
Build an array rec of length 32 which save the digit sum of all numbers.
for num in nums:
for i in range(32):
rec[i] += num>>i &1
if there is no missing number at all, then rec[i] == 2^31 for all i
if there is some missing numbers, check the first index i such that rec[i] <
2^31, then the missing number if of the form
XXXXXXXXXXXX 1 00000000
where digit 1 is located at the i-th place from the right.
We may repeat this process for 32 times. |
|