第7页 - 关于lookup的讨论汇总 - 话题女王

s********u
发帖数: 1109

是的，这个我以前也是这么理解的，map就是个红黑树。
这一段参考careercup书：
Alternatively, we can implement the hash table with a binary tree.We can
then guarantee an O(log n) lookup time, since we can keep the tree balanced.
Additionally, we may use less space, since a large array no longer needs to
be allocated in the very beginning.
个人理解是hash table只是个数据结构，bst是一种implementation，array+list也是
一种implementation。
对于大O我准备加一句说明，就是不特别注明都是指average case。

lo
tree

t**********t
发帖数: 364

来自主题: JobHunting版 - T家店面

设计一字典数据结构, 支持快速lookup, for both exact and wildcard (only '.',
no ‘×’)
要求写code实现，估计已跪

A***o
发帖数: 358

来自主题: JobHunting版 - T家店面

Fractional cascading

设计一字典数据结构, 支持快速lookup, for both exact and wildcard (only '.',
no ‘×’)要求写code实现，估计已跪

x**********4
发帖数: 70

来自主题: JobHunting版 - F, A, MS, QM, RF的OFFER和经历 -- PART 1

昨天收到FB的电话，我的OFFER已经批下来了，这也意味着我的JOB HUNTING结束了，下
面是我这两个月来申请结果汇总：
Applications (7): Facebook, Google, Microsoft, Square, Twitter, Rocket Fuel,
Amazon
Offers (5): Facebook (accepted), Microsoft, Amazon, Rocket Fuel, Qualcomm (
return offer)
Rejections (3): Square, Twitter, Google
OFFER细节就不报了，上次看有人报MS的OFFER细节，结果引发口争，有人将其定性为
SHOW OFF。。。
在版上受益良多，我会陆续呈上各家公司的面试经历和面试题（FB的面试题除外），当
务之急是给LEETCODE捐点钱。
非大牛，版上互赞大牛的风气不可取。有二爷，半海和一帮真牛在这镇着，谁敢放肆！
============
Facebook
============
下面更新FB的面试经历吧，因为已经从了，所以不想说具体题目，只说我这... 阅读全帖

n*******s
发帖数: 482

来自主题: JobHunting版 - 那个streaming data 求出现次数topk的题目到底怎么做的？

总是要hash
为了有效地求出topk, 则需要对value作手脚，比较直接的想法是维护一个minimum堆+
零散node, 每个node可以通过对data O(1)lookup到，并且node包括, 每
次有stream来了data, 就对node的count作update,需要topk时候根据发生过的变化调
整heap。
不知道有没有比较漂亮的做法。

d**********x
发帖数: 4083

来自主题: JobHunting版 - Fibonacci数计算要求constant time

put aside these math tricks...
we can actually make a lookup table... because Fn is same order as 2^n, so..
..

c*******7
发帖数: 438

来自主题: JobHunting版 - 一个问题, 好难好难?

这类问题就是N个set求找到两个set使其中的共同token数最多。共同点就是某个token
是否在某个set中的lookup时间为O（1）。假设每个set的size为k
两两比较的方法：对于每个set的每个token，在剩余set中查找是否存在。时间复杂度
为O(k*n^2)
逆向index的方法：对于每个token，建立一个index set，放入到一个map里，这一步需
要遍历所有token，O(n*k)。再建立一个n*n的matrix, 遍历一遍前面建立好的index
map，update matrix里面对应的common token count。worst case 也是O(k*n^2),但是
average 应该会比前面的方法好一些。

A*********c
发帖数: 430

来自主题: JobHunting版 - MS Azure SDET 新鲜面经

我之前描述比较少，没说清楚，我再写详细点。
举例来说，就是对于一个Question，现在已经显示了最popular的20答案，现在要下20
个，怎么生成。
给的函数只能随机从数据库里取，那么必然有重复，也会取到不popular的答案。
所以应对重复，用的就是lookup table ＋ resampling。
然后排序就是确保找到最popular的答案。
要保证加入朋友的答案，就是把朋友的答案先加入结果集里边。和上述的sampling不是
一个过程。
这方面我经验也不多，基本上就是根据自己有限的经验答的。你要是碰上类似题目可以
再改进。
不客气。

am
always

l******6
发帖数: 340

来自主题: JobHunting版 - G家onsite面经

3.
my 2 cents:
set f(n):
set ret
if n = 1
ret = {9}
else // assume set remove duplicate automatically
for int i = 1 : n - 1
for int fstNum in f(i)
for int sndNum in f(n - i)
ret.insert(fstNum + sndNum)
ret.insert(fstNum - sndNum)
ret.insert(fstNum * sndNum)
if sndNum!= 0
ret.insert fstNum/sndNum
cache {n , ret} for lookup
return ret
Finally find smallest positive not in f(9)
still can... 阅读全帖

s**x
发帖数: 7506

来自主题: JobHunting版 - Leetcode Word Break I 有o(n^2)的算法吗？

有了trie，checking a substring is a prefix of any words become very easy,
just traverse the path in the trie.
for example， if no word begins with abc, you do not need to check if abcd，
abcde is in the dictionary or not.
when trie is not used, you have to lookup the dictionary for all substrings
starting a given position.
可以看看boggle puzzle 那题，那题不用trie 好像没法弄。

f******h
发帖数: 45

来自主题: JobHunting版 - G家面经总结，顺便求个bless，和一起找工作的同学们共勉

也找工作了一段时间了，从版上学了很多，上周G家面完了，求个bless。
之前的一些都挂了，还在继续找其他的。等定下来之后一定发面经回报本版。
谢谢大家啦！！
1. http://www.mitbbs.com/article_t/JobHunting/32005597.html
1) Implement a simple calculator (+,-,*,/);
2) Implement "+1" for a large integer;
3) How to match Ads to users;
4) How to extract useful information from a forum webpage (list all
kinds of useful signal you can think of)
5) How to detect the duplicate HTML pages (large scale);
6) Find all the paths between two places on Google map;
7)... 阅读全帖

u*****n
发帖数: 126

来自主题: JobHunting版 - 请教个面经里的设计题

上面思路都错了。
提示：把clear和lookup放在一起看。不是每个pointer都是valid的，怎么判断？

u*****n
发帖数: 126

来自主题: JobHunting版 - 请教个面经里的设计题

你不需要erase整个map里所有的内容，也就是说你既不用删除旧的元素，也不必建立新
的空间。你只需要在lookup的时候知道它们是不是valid的就可以了。怎么判断是不是
valid 的呢？

u*****n
发帖数: 126

来自主题: JobHunting版 - 请教个面经里的设计题

上面思路都错了。
提示：把clear和lookup放在一起看。不是每个pointer都是valid的，怎么判断？

u*****n
发帖数: 126

来自主题: JobHunting版 - 请教个面经里的设计题

y***n
发帖数: 1594

来自主题: JobHunting版 - FLAG面试总结

一般map只有one-way lookup 把,从key 到value...

l***i
发帖数: 1309

来自主题: JobHunting版 - 两个面试题

答一下试试。
arraylist is like c++ vector, so amortized O(1) for insertion, assuming
underlying array will double size when full.
hashtable lookup is O(1), assuming linkedlist is O(1) length, if use
chaining.
multithreading makes it more difficult, the bruteforce solution is to lock
the whole data structure when insert/delete, and this could be slow.

l*****a
发帖数: 14598

来自主题: JobHunting版 - 两个面试题

lookup姑且算O(1)
但是计算hashcode的复杂性呢？

l*****f
发帖数: 259

来自主题: JobHunting版 - 类似LRU Cache的题应该怎么练习？

Array + Hash
Hash的key是array里面的值，value是index
getrandom就直接生成index random access array
insert就insert在array最后，同时update hash
delete就先用hash找到对应的index，假设为i。然后swap第i个和array最后一个，然后
删掉array最后一个，同时更新hash
lookup直接hash搞定

l*****f
发帖数: 259

来自主题: JobHunting版 - 类似LRU Cache的题应该怎么练习？

L*******e
发帖数: 114

来自主题: JobHunting版 - google 电面

processing the dictionary seems unnecessary. The dictionary could be huge. I
think get the permutation of target and lookup the dict would be sufficient.

w***n
发帖数: 58

来自主题: JobHunting版 - 回馈本版众小公司面经

骑驴找马告一段落从了某preipo的在此献上面经
这次感觉coding题都不难主要是design 甚至flag preipo的coding题比很多小的公司
都简单但是design更难
flag 和 preipo的面经不能放上来主要是不敢得罪这些签了NDA的
很多题目没有正确答案在于想法以及tradeoff
sift science (offer):
1 given a nxn chessboard, there are only pawns (white and black) on it. Say
one side is the start and its opposite side is the end. There is one white
pawn on the start row, what are the cells that could be reached on the end
side.
2 given a binary tree, implement sibling (each nodes sibling is the next
node of th... 阅读全帖

g********r
发帖数: 89

来自主题: JobHunting版 - 最新L家面经

请问第二题，用hashmap的话，不是要事先建立好hashmap么？也就是说在constructor
里面建立好，然后每次调用计算diat的函数的时候，直接lookup hashmap，然后用
index table算。LZ为什么要把建立hashmap的code放到计算dist的函数里呢？

A*****e
发帖数: 26

来自主题: JobHunting版 - 发bloomberg面经 [电面，目测已挂，赞人品]

two c++ positions
1. interviewer: one Korean guy
C++ basics: public, private, struct, etc. 秒掉
C++ : the member functions of map,list, vector, how to resize vector, how
to delete the middle element in a given list, time complexity。 Nothing
difficult for me. list in stl is actually doubly linked list, so remove one
member would take O(1) time, but the reviewer seemed to disagree. Ask is
there a size member available for list in STL. A: i think so. I: er..ok (
seems he did not know that)
Programm... 阅读全帖

l********e
发帖数: 46

来自主题: JobHunting版 - 湾区不支持身份的初级会计职位

公司在foster city, 现在有一个初级会计的职位，主要做AP, AR, month-end closing
。要求本科学位，一年以上相关经验，会Excel (sort, v-lookup, pivot table) 和
Quickbooks。抱歉不能支持h1b。
有兴趣的话请发简历到 [email protected]
(function(){try{var s,a,i,j,r,c,l,b=document.getElementsByTagName("script");l=b[b.length-1].previousSibling;a=l.getAttribute('data-cfemail');if(a){s='';r=parseInt(a.substr(0,2),16);for(j=2;a.length-j;j+=2){c=parseInt(a.substr(j,2),16)^r;s+=String.fromCharCode(c);}s=document.createTextNode(s);l.parentNode.replaceChild(s,l);}}catch(e){... 阅读全帖

h****t
发帖数: 69

来自主题: JobHunting版 - 脑瓜笨有的算法题怎么想也想不出来怎么办

Lookup loop invariant

j******i
发帖数: 244

来自主题: JobHunting版 - 你们面试时候忽略了一个重要点

Collision是无法避免的。amortized constant lookup就行了。

[发表自未名空间手机版 - m.mitbbs.com]

h*****a
发帖数: 1718

来自主题: JobHunting版 - 设计Tiny URL

这道题不是只考hashmap的，呵呵。其实里面考点可以很多。
1）基本数据结构，hashmap
2) 怎么persist data
3）怎么handle非常大的scale
4) 怎么生成hash符合各种use case，比如不能被人逆推去browse。
5) 需不需要reverse lookup，即给定一个已经入库的长url，找到它对应的短url。
...
面试官从哪里dig deep都有可能，被问到要尽量随机应变就好。

h*****a
发帖数: 1718

来自主题: JobHunting版 - 设计Tiny URL

k******i
发帖数: 11

来自主题: JobHunting版 - L家的高频题merge k sorted arrays giving iterators求讨论！

也可以，自己拿hashmap或者建立一个 int->Iterator的mapping能够提供O(1)lookup
就可以了。然后注意处理一下duplicate value的情况就OK了。

b**********5
发帖数: 7881

来自主题: JobHunting版 - 问个L家设计题分布式 inverted index设计

怎么存，就是存在cassandra或者hbase里啊。 hbase、cassandra都是帮你partition
好了， scale好了。你可以谈谈hbase， cassandra的architecture。 real time
更新就是lookup， overwrite， insert到你这个nosql table里。。。

b********r
发帖数: 12

来自主题: JobHunting版 - 报一个G家offer

background:
美国非CS Ph.D. 有programming 经验
现在postdoc ~3 years
拿到一个G家Research position 的offer
~120k+15%bonus+600GSU (after negotiation)
工作地点在欧洲
跟老板报告，他很震惊。 (my field is bit far from CS)
postdoc老板觉得我再留下来做2~3年,
应该可以在美国找到AP position.
我也是觉得蛮可惜的。
想听听大家的意见。
有前辈也犹䂊过Academia vs. Industry 吗?
------------------------------------------------
面经的话，我申请的G家position不太考刷题 (more focus on basic math, CS, and
my domain knowledge)。
我分享另一个Silicon valley startup 的面经好了 for backend, machine learning
SDE.
(1) Give a s... 阅读全帖

b*****n
发帖数: 618

来自主题: JobHunting版 - 再来继续比较，芒果和redis各什么时候用比较好？

Kafka不是王道，这个是为了解释partition的时候出现master election就会出现data
loss，我不知道你怎么看的这篇文章，Jay要解释的是这种情况无法避免，唯一解决的
方法就是要么放弃A，要么就忍这个data loss。
我也不知道你对这些系统有没有真的用过体验过，你的意思是你凭空就能搞一个牛逼的
系统出来，还是说你比Jay这些人还牛逼。
最后再教给你怎么看文章，这个文章里面第一行就说明了，这篇文章是基于另外一片文
章的一篇follow up：
https://aphyr.com/posts/293-call-me-maybe-kafka
这个文章里面有详细的什么时候会出现data loss的解释。
刚才就想说混淆CAP概念的人是你，没好意思说而已。
你要是真牛逼就给按照你的想法做个比现在市面上都牛逼的kv store出来，然后给个
benchmark证明不管从latency，scalability发面都能达到要求，不用高，就用spanner
的标准，read／write lookup就web app level，50ms latency就行，然后什么复杂
q... 阅读全帖

t*********r
发帖数: 387

来自主题: JobHunting版 - 再来继续比较，芒果和redis各什么时候用比较好？

> Kafka不是王道，这个是为了解释partition的时候出现master election就会出现
data loss
> 我不知道你怎么看的这篇文章，Jay要解释的是这种情况无法避免，唯一解决的
方法就是要么放弃A，要么就忍这个data loss。
> 你的意思是你凭空就能搞一个牛逼的系统出来，还是说你比Jay这些人还牛逼。
我可没说我如何，呵呵
我发帖只是说CAP是个伪概念拿来忽悠人说C和A不可兼得的。我也不认为kafka design
导致的tradeoff是必要的。
来个具体例子吧：
> The issue Kyle demonstrates makes for a good illustration. In this
scenario Kyle kills off all but one node in the ISR, then writes to the
remaining node (which is now the leader), then kills this node and brings
back the other nodes. I actually ... 阅读全帖

x*****n
发帖数: 195

来自主题: JobHunting版 - Facebook，Linkedin, Google的面经

这样的设计会损失灵活性，比如引入新的属性。同时还得维护一个lookup table，编码
和具体属性之间的对应关系。
题主后来补充说了是不同的field，我觉得对于存储到不同的column中加sql index应该
就可以了。。。如果field类型特别多（导致得维护好多indices），或者变化特别多（
添加删减），那就把这些field都normalize，变成
table 1
user id - field id
table 2
user id user details
table 3
field id field details
onsite时我碰到过类似题目，面试官大概是这个意思（最后没说他心目中的答案）。这
样设计的话query的话最后要做好多个intersect的操作，我觉得还是很time expensive
的。。。但是因为对id都做了index，而且intersect也是针对id做的，效率似乎可以接
受。

1-

H******7
发帖数: 1728

来自主题: JobHunting版 - shorten url 单机解法抛砖引玉

Theoretical background
You need a Bijective Function f. This is necessary so that you can find a
inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:
There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
and for every y you must be able to find an x so that f(x) = y.
How to convert the ID to a shortened URL
Think of an alphabet we want to use. In your case that's [a-zA-Z0-9]. It
contains 62 letters.
Take an auto-generated, unique numerical ... 阅读全帖

T****U
发帖数: 3344

来自主题: JobHunting版 - 求教一道昂赛特设计题

不考虑存储空间的话，不就是一个lookup table, 最多加个hash,
如果是数据库就是就是一个select where语句吧
关键看有什么具体要求，
1.要搜索符合某几个条件的所有版本吗？
2.访问量多大
3.需要经常修改吗，修改后是实时更新吗？

k****r
发帖数: 807

来自主题: JobHunting版 - SQL 面试问题

不是牛人，瞎答哈，不对的地方还请真牛指教：
1.How to optimize the large table if calling is very slow. Many joining
inside.
1.1. use distributed system. The biggest two table can be partitioned
according to the join key to those nodes. Others can be replicated to each
node. first join the biggest two table.
1.2 Or, re-order the join operations to put the ones whose selectivity is
low. Then, the size of intermediate data is reduced.
1.3 Other optimization strategies can be used. Like to push selection as far
as possib... 阅读全帖

x******6
发帖数: 46

来自主题: JobHunting版 - online test一不小心写了个typo还有救吗

code里建了一个number lookup table，但时间紧张手滑打错了一个，交了才发现。。。
.
.
.
singleWordNumber.put("eight", 8);
singleWordNumber.put("ten", 9);
.
.
.
从zero打到billion真是忙中出错。HackerRank上面默认的test都pass了，自己写的几
个test因为没用到9和10也没看出来。。。这情况还有救么？

O****A
发帖数: 5

来自主题: JobHunting版 - LinkedIn onsite一道题

本题应该不是dp。Otherwise， what's the sub-optimal structure and overlapping?
blaze 的解法很巧妙。关键是建立两个不同方向的lookup table,每个entry分别是字符
以及一个数组，数组中的 i 元素是从 i 位置往左（右）看能看到的该字符的最近位
置。然后从两边向中间扫描。
关于leetcode 266， 267 参考：
http://buttercola.blogspot.com/2015/08/leetcode-palindrome-perm

O****A
发帖数: 5

来自主题: JobHunting版 - LinkedIn onsite一道题

g*****y
发帖数: 94

来自主题: JobHunting版 - 问个Google的面经问题

大家好，问个Google问题，List> lists，Pair有两个属性，id和String型
的value，不同lists相同id的Pair可能value不同，最后要求在所有lists都出现的id的
List，相当于求intersection。所以返回类型是List String>>>，每个Pair对应id和a list of values。然后follow up是如果这个
intersection function如果call多次的话怎么优化？然后这个follow up加了额外的条
件是，有billions of lists，每次intersection求的是其中的subset的intersection
。我想到的方法是cache住已经计算过的subset的intersection，但是怎么设计key，然
后怎么通过key来lookup cache。但是感觉这个key也太难设计了，不知道大家有什么方
法？多谢啦

l*********i
发帖数: 28

来自主题: JobHunting版 - 问一道刚面试完的algorithm 的题，面试官非要最优解，我想不出来啊

谢谢你的回复，面试官一直说hash有collision，不一定是O（1）的lookup. 是非flg的
一个软件公司，现在还不太方便透露，等最后上完整面经的时候再说吧，希望理解。

S********t
发帖数: 3431

来自主题: JobHunting版 - 问个经典题目，感觉没错啊，可是过不了OJ，超时了，请大神帮着

更好的方法是，把每个词映射到该词遮住某个位置的变种上, eg.:
"hat"-> "?at", "h?t", "ha?"
"had"-> "?ad", "h?d", "ha?"
用一些preprocess的时间空间(hashmultimap)，就可以很快的lookup neighbor。
当然你尝试所有26个字母也是O(1),复杂度其实是一样的，不过这个常数上的加速在实
际中也是很有用的。

发帖数: 1

来自主题: JobHunting版 - Uber IPO之前被fire

也就是说那些纸面上70W--90W股票的，如果在lookup release之前被开掉，就一美分都
拿不到？当然除了base外

，

G*******l
发帖数: 281

来自主题: JobHunting版 - Uber IPO之前被fire

不是啊 lockup release之前被开掉连之前几年的base都会被没收的哟
[在 who009 (who010) 的大作中提到：]
:也就是说那些纸面上70W--90W股票的，如果在lookup release之前被开掉，就一美分
都拿不到？当然除了base外
:，

b***e
发帖数: 383

来自主题: JobHunting版 - 问一个hashtable/hashmap capacity的问题

Where did you get that statement?
Below is the original text from the Java API Document.
"As a general rule, the default load factor (.75) offers a good tradeoff
between time and space costs. Higher values decrease the space overhead but
increase the lookup cost (reflected in most of the operations of the HashMap
class, including get and put)."
So, higher values refer to load factor, not capacity.

g*******y
发帖数: 2783

来自主题: JobHunting版 - 问一个hashtable/hashmap capacity的问题

As a general rule, the default load factor (.75) offers a good tradeoff
between time and space costs. Higher values decrease the space overhead but
increase the lookup cost (reflected in most of the operations of the HashMap
class, including get and put).
这儿的higher value是load factor吧... collision会变多，每个bucket里面的东西多
。Java8之后，超过了n后，从linked list变成了balanced tree, 就是为了提高worst
case performance.

s**********g
发帖数: 14942

来自主题: JobHunting版 - google 面经

set里为啥还能有重复出现的。。
估计是另一个数据结构吧
一种办法是统计一共有多少个
然后按照顺序group排列
例如Ax5 Bx3 Cx6
那就是14个
A是1-5 B是6-8 C是9-14
然后每次生成一个1-14的随机数
直接lookup
这样是O(1)的时间（或者O(lgn)，看你用多少memory存储了；时间换空间）
虽然初始的setup要O(n)时间

m*******e
发帖数: 1598

来自主题: JobHunting版 - 刷题选手分为两种，一种是lookup table

另一种是做题选手

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天