第8页 - 关于hash的讨论汇总 - 话题女王

P***P
发帖数: 1387

来自主题: JobHunting版 - amazon intern一共几面，加面经

上周4背靠背了两个，到现在没回复，是不是挂了？
贴贴面经：
一面（老印）
0. 聊聊家常，问问简历
1. 对oop理解：
我答abstract data type
2. 聊聊继承吧
我说了subtype跟interface
3. 多态理解
我说我不是搞programming language的，不太懂，就那回事，爱咋咋滴。他问多态是不
是跟generic差不多，我说差不多吧. (后来想想不对啊，他丫坑我)
4. 设计车库
我都想骂人了，最讨厌这种oo题目，答有车库有车位，车库有入口，告诉你车子有没有空位
子，提示下说不同车型可以return不同相应的空车位
5. circular single linked list, 怎么反序打印
我答先把list翻转了，然后打印，程序都写出来后。他说你不能把input改了啊，
我真想骂他怎么不早说，我说上个stack不久玩了。重新写个stack版的
6. 问我知道hash吧， hash怎么判断hash function好不好，什么时候用bst, 什么时候用
hash, time complexity多少.
他让我讲讲h... 阅读全帖

m*****a
发帖数: 636

来自主题: JobHunting版 - amazon intern一共几面，加面经

心理素质好，google使用得当。
祝马上有好消息

上周4背靠背了两个，到现在没回复，是不是挂了？
贴贴面经：
一面（老印）
0. 聊聊家常，问问简历
1. 对oop理解：
我答abstract data type
2. 聊聊继承吧
我说了subtype跟interface
3. 多态理解
我说我不是搞programming language的，不太懂，就那回事，爱咋咋滴。他问多态是
不是跟
generic差不多，我说差不多吧. (后来想想不对啊，他丫坑我)
4. 设计车库
我都想骂人了，最讨厌这种oo题目，答有车库有车位，车库有入口，告诉你车子有
没有空位子，
提示下说不同车型可以return不同相应的空车位
5. circular single linked list, 怎么反序打印
我答先把list翻转了，然后打印，程序都写出来后。他说你不能把input改了啊，
我真想骂他
怎么不早说，我说上个stack不久玩了。重新写个stack版的
6. 问我知道hash吧， hash怎么判断hash function好不好，什么时候用bst, 什么时
候用
... 阅读全帖

l*******b
发帖数: 2586

来自主题: JobHunting版 - 拓扑排序

不知道,看大牛指导下
面的时候也是说搞个linked list就够了,回来想了下觉得那个需要多一些code,删除,插
入都不如hash table的写起来好写.
最早一直觉得搞一个数组存每个node就好了,发现删除节点会很麻烦,删除了数组就得动
,编号就要变.
要删除简单用链表,每个edge都会成为一个真正的link,可是这样又没有random access
一个node,而且创建一个graph貌似会比较繁琐,这个题倒无所谓,因为总是要遍历一遍,
而且编号都有,不关心具体连接结构,存个整数做标记就好了.
hash table貌似两个都能, 感觉适合需要动态变化的graph. hash table的iterator本
身就是一个linked list实现的吧,插入hash table的时候加到一个linked list里, 这
样. 还是遍历hash数组那样, 根据load factor动态更新hash table的size.

hashMap

J****3
发帖数: 427

来自主题: JobHunting版 - 发个amazon online assessment

我想可不可以在每个Node里加一hash值 hash(x) = hash(x.left) + hash(x.right)
当然这个hash函数要保证相同形状的子树有相同的hash值。然后依据这个DP?

h****y
发帖数: 137

来自主题: JobHunting版 - T家电面面经并且不解为何被秒拒

leetcode原题, permutation II和permutation sequence, 就是把int换成了char, 45
分钟的面试老中面试官大哥迟到5分钟, 却要按时结束, 加上闲扯几句, 这两题一共就
32分钟时间, 我自己感觉除了一点小typo之外没有问题啊, 那几个typo还是因为第二题
完全没时间检查了, 有他迟到的5分钟肯定能检查出来. 6个小时后就收到拒信, 而且我
用的C++, 感觉面试官对C++一点都不熟, 基本不说话, 我说好了后他也不review, 不作
评价,第二题还冒了一句std::set的元素是无序的, 你这样遍历得到的结果是随机的,
还浪费我时间跟他解释, 我汗...
下面贴上我的代码, 想请大家评评, 真心不懂为啥被拒, 能不能跟recruiter complain
一下?
1. input : string
output : print all the permutation of the input
there are duplicates in the input, avoid print out the same string in ... 阅读全帖

l*n
发帖数: 529

来自主题: JobHunting版 - 贡献一个VMWARE的online test题目

你已经设计好了啊，就因为要用hash就“爱咋咋地”？
ps.文件大小基本可以不用考虑，反而增加麻烦，有了hash直接比较就是了。每个文件
每行搞个hash，然后综合起来弄个hash。先比较文件的hash，如果一样再回头去比较每
一行的hash。
edit:
搜了一下，文件全文读取太浪费，这个方案不错。
http://stackoverflow.com/a/1761623/2073130

l*n
发帖数: 529

来自主题: JobHunting版 - L一个电面题

他说的的overflow是因为rolling hash是个多项式，每次roll的时候是lastHash*base+
newChar-firstChar*power(base, n)。overflow其实也无所谓的。rolling hash没有
rehash这一说。
这一题用rolling hash就只是在计算hash的时候节省一点，但还是需要像普通的hash
table那样处理collision，需要存所有unique字串。这么一来，感觉就需要自己搞一个
hashmap了，更麻烦。也许是我不知道有什么好办法能让rolling hash跟hashmap直接无
缝连接。

l*****f
发帖数: 259

来自主题: JobHunting版 - 类似LRU Cache的题应该怎么练习？

Array + Hash
Hash的key是array里面的值，value是index
getrandom就直接生成index random access array
insert就insert在array最后，同时update hash
delete就先用hash找到对应的index，假设为i。然后swap第i个和array最后一个，然后
删掉array最后一个，同时更新hash
lookup直接hash搞定

l*****f
发帖数: 259

来自主题: JobHunting版 - 类似LRU Cache的题应该怎么练习？

w**********4
发帖数: 157

来自主题: JobHunting版 - 亚麻面筋－－已挂

树是一般的二叉树或者多叉树，用hash 的意思是，如果两个系统都用同样的hash函数
，不管多大的树，它的hash值是固定长度，而且是一个很小的值，然后从根节点开始比
较两颗树的hash值，如何hash值相同说明这个节点为根节点的子树是相同的，如果不相
同，迭代的对子节点求hash值继续比较。

W****5
发帖数: 12

来自主题: JobHunting版 - 请教一个面试题

假设有两个warehouse，各自有自己的inventory system, 用如下图所示的树结构表示
。问，如果在A的inventory system中添加了一些node，如何将这些node同步更新到B的
inventory system，怎么进行优化（读取A的node的操作会是remote call，所以开销会
很大）
root
/
clothes electronics books ... ...
/
men's women's
/
... ... ...
/
Nike T-Shirt #[XXXX]
让我实现的方法的signature是 int updateInventory(ITreeNode a, ITreeNode b)，
返回值是新增节点的数目。 TreeNode的结构让我自己定义，大概包括一个id，和包含
所有子节点的list（他给的是list，但是我实现的时候改成了HashMap）。
我的... 阅读全帖

c******e
发帖数: 40

来自主题: JobHunting版 - 水果电面问题 hashmap 用 sperate chaining 时, array size不够怎么办

今天电面被问到一个问题：
如果hash出来的结果，hash code（一个int）超过array的size怎么办。因为32bit系
统上array最大就2^32个，hash code如果超过这个size，怎么处理?
我立马就晕了。。。
请各位大牛指点一下，难道这个不是在设计hash function的时候就要处理吗？hash
function不是根据处理的element的type，设计出来吗？int, float, string 类型的
hash function都不一样吧

t***q
发帖数: 418

来自主题: LosAngeles版 - 如何用python web.py web service 做 multiple parameters 的 (转载)

【以下文字转载自 Programming 讨论区】
发信人: treeq (treeq), 信区: Programming
标题: 如何用python web.py web service 做 multiple parameters 的
发信站: BBS 未名空间站 (Sun Mar 22 23:14:33 2015, 美东)
大家好。我的那个web service 做成了。用了python 的 web.py.
install web.py
cd webpy
编辑python web service.
#!/usr/bin/env python
urls = ('/title_matching2','title_matching2')
app = web.application(urls,globals())
class title_matching2:
def __init__(self):
self.hello = "hello world"
def GET(self):
getInput = web.input(name="W... 阅读全帖

t***q
发帖数: 418

来自主题: SanFrancisco版 - 如何用python web.py web service 做 multiple parameters 的 (转载)

t***q
发帖数: 418

来自主题: Seattle版 - 如何用python web.py web service 做 multiple parameters 的que (转载)

【以下文字转载自 Programming 讨论区】
发信人: treeq (treeq), 信区: Programming
标题: 如何用python web.py web service 做 multiple parameters 的query
发信站: BBS 未名空间站 (Sun Mar 22 23:14:33 2015, 美东)
大家好。我的那个web service 做成了。用了python 的 web.py.
install web.py
cd webpy
编辑python web service.
#!/usr/bin/env python
urls = ('/title_matching2','title_matching2')
app = web.application(urls,globals())
class title_matching2:
def __init__(self):
self.hello = "hello world"
def GET(self):
getInput = web.input(na... 阅读全帖

j******f
发帖数: 825

来自主题: CS版 - Re: 扫盲Re: 密码学领域重大发现：山东大学王小云教授成功破解MD5(ZZ)

俺是真心想知道密码学中哪一部分内容，或者哪些paper是基于强碰撞理论的。
请不吝赐教，因为我真是如何想也想不到。

是汗牛冲栋，水平高低都有，所以有的评论才说他们撼动了整个密码学的根基．括号里的
字本来有自嘲的意思，因为咱们也灌过这么一篇在据说是顶尖的会上.但是用ＣＰＵ和攒
机来比造hash和用hash,一: 愣疾豢湔,实际难度的差别就是这么大的.以上引起的误
会都是该道歉的地方.至于扫盲,没得解释了,太伤害读者了(虽然本意也是指这个问题而非
我们大家),需要跪地道歉.咱这块砖头也盼望能引出玉吧,将来能有真正大侠给出一些真正
专业而非这样故弄玄的帖子,繁荣CS版.
接看
假设
resistant);如果任给一个输入,都能找着另一个输入和它有相同的hash值,那这个hash
被攻
这是
里给
了,
HA1了,而且大牛们已经担心SHA1也保不了多少天了,必须去造新的hash函数.造个新hash函
,而
别打
的)相

I******e
发帖数: 101

来自主题: Database版 - Oracle 看来还有很长的路

Your understanding about hash join's memory usage might not be correct.
Hash join only requires around sqrt(data size) to do in memory join. Even
though you are joining terabytes data, hash join can still do in-memory join
with around 300MB memory, which is the power of hash join.
I wonder how much memory hash join is using in your case. This depends
whether you are using automatic memory management in 10g. Hash join's
memory usage is not related to memory buffer or SGA which can be influenc

m*********y
发帖数: 389

来自主题: Database版 - How to find all duplicate record in SQL?

How to add a hash? I used the Checksum function, but I heard that HashType
is more powerful than Checksum. Anyway, let's say you want hash on the the
combination of first name, last name and address, you can do this:
Alter table person
Add hash as checksum(firstname,lastname,address)
After you created hash, searching for duplicate record is much easier, you
just need to run below query:
select hash,count(*) from person group by hash having count(*)>1

h*******c
发帖数: 248

来自主题: Programming版 - 一个找电话号码题

这个问题我工作中常碰到。我的做法是，对电话号码hash，按hash把数据分块。每块儿
小于内存，然后一块一块的数。
顺便给大家贡献一个hash的perl code，是我知道的最简单，最快的玩儿法了：
use B;
$hash=B::hash("abscd");
$int_hash=hex($hash);
这个结果是个大整数。分块儿求mod就是了。

y**b
发帖数: 10166

来自主题: Programming版 - boost::unordered一问

http://www.boost.org/doc/libs/1_51_0/doc/html/unordered.html
a hash table only needs an equality function and a hash function for the key.
简言之，哈希表只需等式函数和哈希函数。
http://www.boost.org/doc/libs/1_51_0/doc/html/hash/custom.html
When writing a hash function, first look at how the equality function works.
Objects that are equal must generate the same hash value.
When objects are not equal they should generate different hash values.
简言之，写哈希函数，先参考等式函数。
相等的对象必须产生相等的哈希值；
不等的对象应该产生不同的哈希值。
我有一个unordered_set, ... 阅读全帖

t********8
发帖数: 30

来自主题: Software版 - [求助]在学校用bt被通知了 ...

good point
不过hash应该是在自己的计算机上产生，然后和文件一起提供给下载者的吧，服务器或
者是查盗版的机构要获取文件的hash好像要下载完全部文件才行。
所以你还是可以argue说这个hash和下载文件不是配套的~，为了隐私起见，我强制bt软
件用了一个假hash行不？
不知道有没有软件可以根据hash倒推产生文件的。应该有不止一个文件可以对应同一
hash吧。

S******3
发帖数: 66

来自主题: Statistics版 - 在SAS中不用proc sql的情况下实现inequality join

actually, hash can do it with some trick. it's pretty fast and does not
require much temporary space.
some pseduo code for your reference (you need rename date variables first).
I think it worth some dollars, at least WB!
/* check to see if the current record has a match in the hash object */
rc = lookup.find(key: id);
/* if a match is found... */
if rc = 0 then do;
/* check the dates - if comparison is true, output the row */
if 1<=intck('month', date_b, date_a)<=60 then output;... 阅读全帖

L******f
发帖数: 5368

来自主题: Military版 - 破解密码高手王小云教授简介

破解密码高手王小云教授简介
王小云教授，1966年生于山东诸城，1983年至1993年就读于山东大学数学系，
先后获得学士、硕士和博士学位，导师潘承洞。1993年毕业后留校任教。
2005年获国家自然科学基金杰出青年基金资助，同年入选清华大学“百名人才计划”，
2005年6月受聘为清华大学高等研究中心“杨振宁讲座教授”，现为清华大学“长江学
者特聘教授”。
王小云教授带领的研究小组于2004年、2005年先后破解了被广泛应用于计算机
安全系统的MD5和SHA-1两大密码算法，对于这项十几年来国际上首次成功破解
全球广泛使用的密码算法与标准的工作，整个国际密码学界为之震惊，密码学领域
最权威的两大刊物Eurocrypto与Crypto将2005年度最佳论文奖授予了这位中国女性，
其研究成果引起了国际同行的广泛关注。
她获得由全国妇联、中国联合国教科文组织全国委员会、中国科协和欧莱雅
(中国)有限公司创立的，被誉为女性诺贝尔奖的中国青年女科学家奖。
MD5、SHA-1大厦轰然倒塌
在2004年8月之前，国际密码学界对王小云这个名字并不熟悉。2004年8月，
在美国加州圣芭芭拉召开的国际密码... 阅读全帖

x***y
发帖数: 633

来自主题: JobHunting版 - amazon question

How about using open hashing as follows:
1) Hash all the elements in A, element's value as key and element's index
as data in hash table (O(n), ingoring the hashing calculation overhead)
2) In the hash table, each list corresponding to the elements in B, has a
pointer pointing to the first data ( the index in A) initially,
store the m indices in a min-heap(O(mlogm)), find the max (O(m)), min (O(1)) and the window length is max-min.
3) move the list, where min is in, one step ahead( the indice

g********d
发帖数: 43

来自主题: JobHunting版 - 2轮Amazon电面

忘提了，空间代价必须是constant.
我当时比较了一下，如果用hash table,空间需求会比较大，如果大小固定，当数组很
大时，性能会degrade。
排序的空间代价小，但查询慢。
我说了hash之后，他就追问hash table的细节了,开散列、闭散列等等
后来他又问如果这个数组实在太大，放不到内存里，我就有比较了一下hash table, B/
B+ tree,不过这里有点心虚，以前没考虑过hash table很大的情况下，磁盘I/O,key
clustering会造成的性能影响。

c*****y
发帖数: 90

来自主题: JobHunting版 - 微软onsite

SDET Position
包括recruiter一共见了5个人，最后一个是大经理，我拿到的schedule上没安排的。我
因为已经有一个offer了，要recruiter尽快告诉我结果，最后没拿到offer。所有的人都
很nice，题目也很简单。我说我答得不好的吧，反正我也没签什么协议。
一个array，没有sorted，找出所有pairs，sum是100。我说出了不同方法。问我如果用
hash table，准备用什么hash function？这是我一点都不知道怎么答的。我非CS科班
毕业，只知道用hash table，从来没考虑过用什么hash function。接下来就讨论hash
table实现的问题，也答得不好。

x******3
发帖数: 245

来自主题: JobHunting版 - Facebook 2 轮电面面经 + 为第三轮求福

我觉得这道题有点buggy
万一新插入数据的hash值和least recently used的元素的hash值不一样怎么办
比如我hash table里顺序存了 [0, 1, 2, 3, 4], hash table的size是5，hash
function是mod 5
当我要插入8的时候，我应该插入倒3的位置，但是least recently used的元素是0，
难道我把5放在0的位置？

m****y
发帖数: 28

来自主题: JobHunting版 - 一个google面试题

我来说个比较可行的思路吧
首先人家给了disk space就是让你放些中间结果的
我的想法是用hash来先把data record粗略归类到很小的子集，然后在每个子集里面找
重复就可以了。
1. 对每个data record，我们生成一个形如的ID，假设用md5作
为hash函数，再假设record number占用4个字节，那么每个ID的大小是16+4=20字节。
2. 生成ID的时候，把它归类到一个较小的子集，写到磁盘上去。对于2^30个record，
我们根据(hash mod 2^13)的结果，把它分成2^13个子集，每个子集存成一个文件，每
个文件包含大约2^17个record的ID，这样子集文件的平均大小大概是2^17*20=2MB左右
，总共需要大约2^30*20=4GB的磁盘空间。
3. 最后就是把这么些子集文件读到内存里，然后找重复。显然重复的data record的ID
肯定属于同一个子集，简单的办法是把这2MB的数据排个序，对于重复的hash再根据
record number去读实际的data record来作比较。这样基

i***1
发帖数: 95

来自主题: JobHunting版 - 面试题palindrome

很有意思的题目。。。
我有个idea，如果一个串是palindrome，那么对它的hash，从头到尾，和从尾到头是一
样的。
譬如用这样的hash方法。
int hash(char a[], int n)
{
int res = 0;
for(int i=0;i res = ((res*37)+(int)char[i])%3137;
}
在扫描的同时，保存另一个hash值。（和上面的hash方法要有点变化）
在此基础上，可以用bloom filter的方法，减少false positive的可能性。

g*******y
发帖数: 1930

来自主题: JobHunting版 - 做题了，看看有没有比我更好的解法（20个包子）

好久没做题了，为了20个包子，我来试试吧。
从左边开始，看第一个3-tuple,{a0, a1, a2}
maintain the following info:
current best solution length: 0
hash each new number's first occurrence: a0~0, a1~1
case1: current 3-tuple is valid
find a2 in hash table:
if (found) {
current_best = MAX(current_best, |hash[a2] - current index of a2|;
} else {
hash[a2] = current index of a2; // insert a2 into hash table.
}
case2: current 3-tuple is invalid:
possible_best = solution of (a1 ... an);
return MAX(current_best, possible_best);

s*******e
发帖数: 93

来自主题: JobHunting版 - 刚面了amazon

我猜是
先扫一遍，把每个数存入hash, key=sum-a[i], value=i
再扫第二遍，查找 key=a[i]的是否存在,有的话这个key对应的value就是和i pair的
index.
比如数组是 1 3 4 7 2, 和是5
那 hash(5-1)=0, hash(5-3)=1, hash(5-4)=2 ...
扫第二遍是看到a[0]=1， hash(1)=2, 所以a[0]=1和a[2]=4是一个pair

l*****a
发帖数: 14598

来自主题: JobHunting版 - Google的面经

my solution:
1.create hash for each word
2.sort the dictionary by word length.
assume the dictionary become
a[0],a[1]......
find the largest k so that hash(a[0])&hash(a[k])==0,
get max=len(a[0])*len(a[k])
find max i in [2,k-1] so that hash(a[1])&hash(a[i])==0,
compare max and len(a[1])*len(a[i])
next step find max j in [3,i-1]....

l*****a
发帖数: 14598

来自主题: JobHunting版 - Google的面经

m**q
发帖数: 189

来自主题: JobHunting版 - 问两道amazon的面试题

两个hash，一个user hash，每一项存userid和对应的三连击中的前两个值；一个三连
击hash，存三连击string和count。
对于logfile中的每一行，在这两个hash中查找并更新。如果认为每次hash的复杂度为O
(1),则总的时间复杂度为O(n)。空间复杂度为O(m+k)，m为userid的个数，k为不同的三
连击的个数。

t**********n
发帖数: 145

来自主题: JobHunting版 - 刚拿到A公司的offer，呈上面经

==============================================================
这里的anagram不是基于语义的啦，只是基于字母组合，比如tan, nat是ant的
anagram。字典文件的每一行就是一个单词，没有其他的了。
通常的做法是，先load字典文件到一个hash，每读到一个word，按alphabeta
将letter、排序，作为hash-key，然后存进到hash里面，每个hash-key对应
一个list存有相同key的所有word。
需要求解某单词的anagram的时候，就只需要look up一下hash表就可以了。
这个方法的复杂度是O(1)。pre-computation的复杂度O(n)，其中n是字典的
大小。
此题在《Hacking Google Interview》的handout中有。

i**********e
发帖数: 1145

来自主题: JobHunting版 - 问一道google题

lolhaha 的算法的复杂度是 O(N^2).
应该没有比 O(N^2) 更好的算法了，
但是有一个优化可以把算法运行速度大大提升。
根据 lolhaha 的算法，
1.create hash for each word
2.sort the dictionary by word length.
assume the dictionary become
a[0],a[1]......
find the largest k so that hash(a[0])&hash(a[k])==0,
get max=len(a[0])*len(a[k])
find max i in [2,k-1] so that hash(a[1])&hash(a[i])==0,
compare max and len(a[1])*len(a[i])
next step find max j in [3,i-1]....
Using this online word list:
http://www.sil.org/linguistics/wordlists/english/wordlist/words
(co... 阅读全帖

j***y
发帖数: 2074

来自主题: JobHunting版 - 大家windows下面用什么写C程序的？

我用你的coding panel调试remove duplicate的程序：
－－－
#include
//#include
#include
using namespace std;
const int REMOVE = -1000;
int remove_duplicate(int a[],int n){
int i = 0, j = 0;
//unordered_map hash;
map hash;
for(i = 0; i < n; ++i){
if(hash.find(a[i]) != hash.end()) {
a[i] = REMOVE;
} else {
hash[a[i]] = 1;
}
}
j = 0;
i = 0;
while (i < n){
while (a[i] == REMOVE){
i++;
}
a[j] = a[i];
i... 阅读全帖

j***y
发帖数: 2074

来自主题: JobHunting版 - 请教几个电面题

谢谢啊。这两天我也在学习hashtable的方法，刚好看到C++里面有个map可以用。
调试了一下你的程序，稍作调整后的code如下：
－－－
#include
//#include
#include
using namespace std;
const int REMOVE = -1000;
int remove_duplicate(int a[],int n){
int i = 0, j = 0;
//unordered_map hash;
map hash;
for(i = 0; i < n; ++i){
if(hash.find(a[i]) != hash.end()) {
a[i] = REMOVE;
} else {
hash[a[i]] = 1;
}
}
j = 0;
i = 0;
while (i < n){
while (a[i] == REMOVE){
i++;... 阅读全帖

m********l
发帖数: 4394

来自主题: JobHunting版 - Given an int array and an int value. Find all pairs in arr

重看了下，你的问题好像没说清楚
不过试试用Chained Hash了吗？
1) create the hash table.
2) create a link-list node that stores the position of the value
3) put the values inside the hash table. Preferably, during insertion,
you want LIFO
4) loop thru the array again
4.1) get hash[sum-value]
4.2) print if the position of the hash value is greater than current
value's position.
4.3) if position > node->position, continue;
=============
Yes, it's a hack.
why do they test you on procedural stuff when all they teach you in... 阅读全帖

c**y
发帖数: 172

来自主题: JobHunting版 - 发AMZ电面经，攒 RP

AMZ 两轮店面
第一轮：
1.实现一个hash函数，输入string，输出一个hash值。
考虑复杂了，面试官提示说不用考虑太复杂的hash函数实现。
2.实现一个hash table用来存储strings。
这个后来用STL的map做的。
第二轮：
1.口述array, linked list, binary tree, hash table的区别和特点
随便说了说，这个问题被不同的公司问过好多次了。
2.给定一个binary tree，判断是不是一个binary search tree
google一下，网上有比较好的code，很简单，用recursive方法
3.实现a deck of cards class，具体要求实现两个函数(1) DealACard (2) Shuffle
DealACard就是从当前deck中抽取一张card，怎么抽自己定义，我就说从top抽
Shuffle就是洗排。
两个面试官都很nice。

s***c
发帖数: 50

来自主题: JobHunting版 - 微软onsite面经

刚从微软onsite回来。当天面试了6轮，从11点直到下午6点，最后都快撑不住了。有
一个算法问题，都不太难，但有很多open design的问题有些难度。我记得的问题有几
个：
1. two arrays of strings A and B, find intersections, union, A-B and B-A.
如果array小于mem size，我的思路是: scan A array, 建立hash table; 然后scan
array B, 对每一个元素
检查是否在A 的hash table存在。在白板上code实现。
如果array 大于mem size, 导致hash table也大于mem size, 则对每个array做
external sort:
把array分割成多块，每块都小于mem size, load到memory, sort, 写回disk；然后
merge sort.
之后，顺序扫描另个array，比较每个元素是否相同。
面试官还问，如果用SSD，该如何改进。我的答案是建立index table,但这是把index
table存在... 阅读全帖

g**********y
发帖数: 14569

来自主题: JobHunting版 - 回文数的问题

Rolling hash code ->
public class FirstPalindrome {
private final static int MOD = 1000000007;

public int first(int[] a) {
int pow3 = 1;
int hash = a[0];
int reverse = a[0];

for (int i=1; i pow3 = (pow3 * 3) % MOD;
hash = (hash + a[i] * pow3) % MOD;
reverse = (3*reverse + a[i]) % MOD;

if (hash == reverse && test(a, i)) return i;
}

return -1;
... 阅读全帖

j********x
发帖数: 2330

来自主题: JobHunting版 - HASHTABLE collision 后REHASH 怎么SEARCH

这都搞笑呢吧。。。
hash table，hash（key）是作为index，插入对应bucket；
query的时候，找到对应的bucket，然后查找对应的key。。。
rehash（应该叫double hashing）只是放到另一个bucket
query的时候先用第一个hash，找到bucket，找key；没有，第二个hash，第二个bucket
，找key，有，return，没有，拉倒。。。
什么跟什么，阿三是生物学毕业的码工吧。。。

S**I
发帖数: 15689

来自主题: JobHunting版 - [合集] A onsite被拒,面经,求分析失败原因

☆─────────────────────────────────────☆
fengzhongdi (fzd) 于 (Fri May 27 14:30:18 2011, 美东) 提到:
A的onsite,我尽量客观写,大家帮我分析一下
第一个美国人, 两个string,问第一个是否包含第二个的全部character,扩展是求第一
个string最短的substring包含第二个中所有的character.
我的解法是hashtable, 然后扩展问题是O(n)那个,先找到第一个包含全部的,然后从头
开始删除,接着从尾巴增加那个.面试官表示满意.
OOD是设计card game
交流过程很融洽.
第二个 hiring manage,印度人, 这个人带我去吃饭,然后问了我电梯设计问题,纠缠在
什么时候读楼层,还有如何判断要不要停下来接人.说了快50分钟.我个人感觉还好,反正
他也是都是笑到最后
第三个,白人,一开始问我LRU cache的设计,我直接告诉他double linked list + hash
table,他很吃惊问我是不是准备过,我说是,然后换了一题Bian... 阅读全帖

S**I
发帖数: 15689

来自主题: JobHunting版 - [合集] G家onsite面经

☆─────────────────────────────────────☆
sharc (sharc) 于 (Mon Aug 22 15:15:14 2011, 美东) 提到:
刚从G家onsite归来。新鲜面经奉上。
总共5轮，4轮technical interview, 一个thesis discussion。在technical里，有编
程题，有open design。我记得的问题有：
1. 编程题：一堆字符串。找longest common prefix。
我的方法就是找最短的字符串，对它的每个字符，逐个与其他字符串对应位置比较。（
求更好方法）
2. open question: 一堆文件，size差别极大( from KB to many GB). 找出所有内
容相同的文件。
3. 编程题：有一个observer 类，监视另一个类foo 的成员变量的值，每当那个值被
修改，就要调用该observer.updated() 方法。需要实现 foo.registerObserver(ob)
, foo.unregisterObserver( ob )... 阅读全帖

e****r
发帖数: 581

来自主题: JobHunting版 - an interview algorithm question about finding even occuring freq

如果不连续，我的想法还是用类似的方法（其实就是hash）
稍微麻烦一些。相同的数hash（％）到一起，用一样的方法处理
不同的数hash到一起，得用open addressing
(0) 如果a[i]已经在正确的位置(i==j)或者a[i]<0，直接i++
(1) a[j] == a[i], a[j] <- -a[i]-1, i++
(2) a[j] == -a[i]-1, a[j] <- a[i], i++
(3) otherwise, 如果((a[j] > 0) && (a[j] % n == j)) || ((a[j] < 0) && ((-a[j]
+1) % n == j))就说明是collision了（我们估且认为不同的数hash到一起才是
collision）。没有collision的话，就直接交换a[i]和a[j],i不变
处理collision就用open addressing：一个一个搜索，知道找到一个尚未处理过的位置
，或者找到正确的hash slot
细节蛮多的，基本上就是实现open addressing了……
大概意思就是这样

t******e
发帖数: 98

来自主题: JobHunting版 - 一道fb的题，clone a graph

lz算法没问题。我的程序如下请指教：
// copy graph
struct node
{
int id;
vector adj;
node(int id) : id(id){}
};
map hash;
node* copy(node* x)
{
if(!x) return 0;
if(hash.count(x) != 0) return hash[x];
hash[x] = new node(x->id);
node* y = hash[x];
for(int i = 0; i < x->adj.size(); ++i){
y->adj.push_back(copy(x->adj[i]));
}
return y;
}

n**4
发帖数: 719

来自主题: JobHunting版 - 找二叉树两个最大的相同子树

2楼进化树能展开说一下吗？
我有一个解法记得是版上某大牛做的
basic idea: use a hash function to get a digest of a tree by hashing its
child subtrees' digests:
digest(tree)=0 if tree is empty
digest(tree)=hash(digest(LeftChildTree),digest(RightChildTree),rootVal)
therefore, this problem can be solved by a post-order traversal. On the
traversal, all the subtrees are digested and registered. If a match occurs,
the roots of both trees and their size are compared and recorded.
the hashTable_get/put functions need the tree digest... 阅读全帖

k**********g
发帖数: 989

来自主题: JobHunting版 - 被recruiter问到的2个基础题

In this question, it seems "random access" means O(1) access given the index
of the item (consecutive numbers assigned to each item, from 0 to N-1).
Hash table is an array of pairs between hash values and content. If you know
either (1) the hash value of your item, or (2) a "replica item" from which
you can recalculate the hash value, then you can access that item in O(1).
But a Hash table (a basic one) doesn't assign any item index to items, nor
does it remember it.
Thinking of them as maps wit... 阅读全帖

p*******8
发帖数: 344

来自主题: JobHunting版 - 贡献一次电面题

1）一个数组，除了一个数只出现一次，其他都出现两次，找出这个数。经典题之一，
XOR就行了
2）很大一个文件，内存放不下，里面都是整数，有重复，求只出现一次的整数的个数
。应该是大数据吧，我就说了hash到多个小文件，保证一样的整数到同一个小文件，然
后依次读进内存用hashmap/hashset处理，面试官说如果所有数都一样，hash后还是一
个文件，我想了下想再hash一次，后来想干脆用hadoop搞，用两个job，第一个每个map
读进来，key是integer本身以及map task id，reducer负责输出这个task的unique的整
数，partioner根据integer和map id进行分配，然后第二个job把reducer设置成1个进
行合并。感觉杀鸡用牛刀了，但想不到啥其他方法
3）差不多的题，这次输出所有unique的数。我想了下先把所有一样的数hash到小文件，
如果小文件size还太大，再进行二次hash，根据文件size进行平均分配，然后处理每个
小文件，最后合并结果。
感觉2）3）答得不好，大数据以前就稍微看了下top k之类的，都是hashmap ... 阅读全帖

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天