一个算法问题 - JobHunting版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

JobHunting版 - 一个算法问题

相关主题
● external sorting的一个问题	● external sorting的最后一部　merge的时候，是不是要消耗很多I/O
● 再问道题	● 再问一个算法题。
● bloomberg 面经	● G家电面经
● Median of Two Sorted Arrays	● 问个CareerCup上external sort的题
● 终于弄明白median of two sorted arrays了，发帖庆祝一下	● 问个题
● 刷题刷到没自信了	● 一个小公司面经
● 求代码，median of K sorted list	● BB NON CS onsite面经
● external sorting 的问题	● find median for k sorted arrays

相关话题的讨论汇总
话题: median话题: 算法话题: user话题: timing话题: external

进入JobHunting版参与讨论

1

(共1页)

w****f 发帖数: 684	1 有一个文件含1000000000个（user, login-timing,...）要求登陆时间前1000名，以及median。请问那种算法最好？
s******n 发帖数: 3946	2 根据userid hash分配到k台机器。每台机器上统计，排序。然后k-way合并(用heap)，得到总排名的前1000和median
w****f 发帖数: 684	3 谢谢，swan。这种算法好像对占memory较多，有无更好的算法？
z*********8 发帖数: 2070	4 求前1000个用 maxheap就行了，求median就只能那么做【在 w****f 的大作中提到】 : 谢谢，swan。 : 这种算法好像对占memory较多，有无更好的算法？
s******n 发帖数: 3946	5 求median必须要排序，可以用external sort
d****n 发帖数: 1637	6 +1 好像1 billion 的数据不是都load 到memory. 是不是要把user 放进bst 或者hash里面，只保留最早时间？ 1 billion user, 多少unique names? 【在 z*********8 的大作中提到】 : 求前1000个用 maxheap就行了，求median就只能那么做
w****f 发帖数: 684	7 It's 1 billion user, and only 1 million unique userid. "求median必须要排序，可以用external sort". What's external sort? Does the below work? 1.) take first 1001, use 501th as the initial median values of login- timing. 2.) read next one and shift the median to fit the new one. 3.) repeat step 2 till the end. (But this one only give the values of timing, not the associated useid)
w****x 发帖数: 2483	8 1. 把文件切成2^n个大小相同的小文件, 每两个可以装入内存 2. 两两载入内存, 对每个pair做median merge, 2 个 file merge成一个大小相同的 file 3. 如此merge下去, 直到所有的file浓缩成一个file, 取这个file的median, 不过这题因该允许取得非精确median

1

(共1页)

进入JobHunting版参与讨论

相关主题
● find median for k sorted arrays	● 终于弄明白median of two sorted arrays了，发帖庆祝一下
● careercup书上那个maintain median value的题	● 刷题刷到没自信了
● An interview question of finding the median in a moving window.	● 求代码，median of K sorted list
● 找一个stream of double 的exact median怎么做？	● external sorting 的问题
● external sorting的一个问题	● external sorting的最后一部　merge的时候，是不是要消耗很多I/O
● 再问道题	● 再问一个算法题。
● bloomberg 面经	● G家电面经
● Median of Two Sorted Arrays	● 问个CareerCup上external sort的题

相关话题的讨论汇总
话题: median话题: 算法话题: user话题: timing话题: external

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)