maintain top frequencies for stream data - JobHunting版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

JobHunting版 - maintain top frequencies for stream data

相关主题
● 求问关于qualcomm的 export license	● Marvell电话面试题＋问题请教
● 这是神马一种情况？	● High Frequency Trading Firm 是干嘛的啊？
● 电面后被拒，帮忙看看	● 谁会做>??????????????????????????????????????
● Looking for C# programmers	● bloomberg面经
● we are looking for a c# developer	● 问个统计的小概念
● 请教F家和T家最近的一道常见题	● 请教offer选择问题（Google vs iBank）
● 这种Design类型的题目怎么练，求指教！Display average for the last 10 minutes	● 请教一题
● storm和spark, maprduce比有什么优势？	● Is this normal

相关话题的讨论汇总
话题: top话题: search话题: display话题: terms话题: stream

进入JobHunting版参与讨论

1

(共1页)

m*****P 发帖数: 1331	1 这貌似是经典题大概什么思路？ You have been asked to design some software to continuously display the top 10 search terms on Google. You are given access to a feed that provides an endless real-time stream of search terms currently being searched on Google. Describe what algorithm and data structures you would use to implement this . You are to design two variations: (i) Display the top 10 search terms of all time (i.e. since you started reading the feed). (ii) Display only the top 10 search terms for the past month, updated hourly. You can use an approximation to obtain the top 10 list, but you must justify your choices.
c*****r 发帖数: 108	2 我今天恰好也在向这个题目下面是我的想法。 MapReduce 把每个search term发送到不同的peer服务器上。这样每个服务器负责累加一系列的search term。同时每个服务器上maintain一个大顶堆负责记录本服务器上最 frequent的terms。堆的大小为该服务器所负责的search term数量。时事的发送堆顶的10个search term给一个masternode。masternode负责筛选各个机器上发来的单词更具frequency再选出10个display
m****i 发帖数: 650	3 第二题有什么好方法
f*****e 发帖数: 2992	4 1,2,2,4,8,8,16,16,.... 误差50% 【在 m****i 的大作中提到】 : 第二题有什么好方法
m*****P 发帖数: 1331	5 你这个主要是用了distribution的思想我想这个题目应该也考察infinite stream这个东西因为再怎么distribute 资源都是有限的也许需要某种approximation的方法？【在 c*****r 的大作中提到】 : 我今天恰好也在向这个题目下面是我的想法。 : MapReduce 把每个search term发送到不同的peer服务器上。这样每个服务器负责累加 : 一系列的search term。同时每个服务器上maintain一个大顶堆负责记录本服务器上最 : frequent的terms。堆的大小为该服务器所负责的search term数量。 : 时事的发送堆顶的10个search term给一个masternode。masternode负责筛选各个机器 : 上发来的单词更具frequency再选出10个display

1

(共1页)

进入JobHunting版参与讨论

相关主题
● Is this normal	● we are looking for a c# developer
● CS algorithm question	● 请教F家和T家最近的一道常见题
● 诚心请教两个offer的选择 (转载)	● 这种Design类型的题目怎么练，求指教！Display average for the last 10 minutes
● How to find 10 most frequent strings in 10 billion string list?	● storm和spark, maprduce比有什么优势？
● 求问关于qualcomm的 export license	● Marvell电话面试题＋问题请教
● 这是神马一种情况？	● High Frequency Trading Firm 是干嘛的啊？
● 电面后被拒，帮忙看看	● 谁会做>??????????????????????????????????????
● Looking for C# programmers	● bloomberg面经

相关话题的讨论汇总
话题: top话题: search话题: display话题: terms话题: stream

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)