h********r 发帖数: 30 | 1 if you have a large file, hundreds of millions of lines that I need to
select a random line from. How to do it through reads through the file once?
I have no clue how to do it.... | x*****p 发帖数: 1707 | 2 First, read line by line
When you read the first line, keep it in the memory;
When you read the second line, you have 1/2 chance to replace the line in
the memory.
When you read the n-th line, you have 1/n chance to replace the line in
the
memory.
If you totally go through m lines, then for any line, say the k-th line,
the
chance that it can be selected in the memory is
1/k * (1 - 1/(k+1)) * ... * (1 - 1/m) = 1/m
So it is a real random choice, evenly distributed. | g*******s 发帖数: 490 | | h********r 发帖数: 30 | 4 great, thanks a lot!
【在 x*****p 的大作中提到】 : First, read line by line : When you read the first line, keep it in the memory; : When you read the second line, you have 1/2 chance to replace the line in : the memory. : When you read the n-th line, you have 1/n chance to replace the line in : the : memory. : If you totally go through m lines, then for any line, say the k-th line, : the : chance that it can be selected in the memory is
|
|