g*******y 发帖数: 2114 | 1 一个小公司的电面题,大意如下,稍作修改
A document has 10 billion lines and about 50000 unique words.
Find top 10 frequent words efficiently.
编程珠玑里貌似有 但是手头找不到这本书了 |
i***1 发帖数: 95 | 2 use hashtable to store world->count pair.
Then sort it.
Or even better, use a max heap to save the top ten. (which is not really
necessary in this case.) |
g*******y 发帖数: 2114 | 3 en
我答的是第一个方法
也提到可以用heap去save top ten
不过第一次用文档共享写代码
有点不适应
呵呵
练习少了
用惯ide的坏处
最后还是用数组存top ten 了
假设有n行,m个不同单词
top k
算法复杂度是 O(n+km)
空间是O(m+k)
对吧
really
【在 i***1 的大作中提到】 : use hashtable to store world->count pair. : Then sort it. : Or even better, use a max heap to save the top ten. (which is not really : necessary in this case.)
|
f*********5 发帖数: 576 | 4 sort the 50000 occurrence times?
I think it is better to use heap
【在 i***1 的大作中提到】 : use hashtable to store world->count pair. : Then sort it. : Or even better, use a max heap to save the top ten. (which is not really : necessary in this case.)
|
f*********5 发帖数: 576 | 5 复杂度似乎跟行数没有关系
【在 g*******y 的大作中提到】 : en : 我答的是第一个方法 : 也提到可以用heap去save top ten : 不过第一次用文档共享写代码 : 有点不适应 : 呵呵 : 练习少了 : 用惯ide的坏处 : 最后还是用数组存top ten 了 : 假设有n行,m个不同单词
|
i***1 发帖数: 95 | 6 if we use an O(n log n) method to sort 50000 elements, it is around 50000*16
= 800000 which is easily handled by current PC.
【在 f*********5 的大作中提到】 : sort the 50000 occurrence times? : I think it is better to use heap
|
f*********5 发帖数: 576 | 7 if u think so,that is fine.
but when we talk about time complexity,we will definitely choose
nlogk ,while not nlogn.
sure,we will use O(k) for extra space while u don't need.
at least we should mention this to the interviewer
50000*16
【在 i***1 的大作中提到】 : if we use an O(n log n) method to sort 50000 elements, it is around 50000*16 : = 800000 which is easily handled by current PC.
|
g*******y 发帖数: 2114 | 8 建hash表的时候得一行一行扫吧
【在 f*********5 的大作中提到】 : 复杂度似乎跟行数没有关系
|
g*******y 发帖数: 2114 | 9 Thanks!
【在 f*********5 的大作中提到】 : if u think so,that is fine. : but when we talk about time complexity,we will definitely choose : nlogk ,while not nlogn. : sure,we will use O(k) for extra space while u don't need. : at least we should mention this to the interviewer : : 50000*16
|
f*********5 发帖数: 576 | 10 u use the word to create hashtable
【在 g*******y 的大作中提到】 : 建hash表的时候得一行一行扫吧
|