由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
JobHunting版 - 问一道google题
相关主题
一个google面试题一道面试题
贴个G家的电面题目吧问一道关于字符串的面试题
请教个面试题MS intern 电面被拒,附上面试过程
facebook面试题Google的一道面试题
两道简单的面试题算法面试题
几道微软面试题implement hash table
longest word made of other words在线紧急求助一道system design面试题,面经内附
一道C/C++的面试题longest subarray with numbers arranged as a seq
相关话题的讨论汇总
话题: word话题: longest话题: mark话题: hash话题: characters
进入JobHunting版参与讨论
1 (共1页)
s*******t
发帖数: 248
1
一个单词的列表,要找出两个单词,它们没有相同的字母出现,而且长度乘积最大
原帖在这里:
http://mitbbs.com/article_t/JobHunting/31744763.html
感觉没有哪个解法比较好。
f*******4
发帖数: 1401
2
有比O(n^2)更快的办法咩?
c********t
发帖数: 5706
3
我觉得lolhaha的算法挺好,就是不知道怎么算复杂度。

【在 s*******t 的大作中提到】
: 一个单词的列表,要找出两个单词,它们没有相同的字母出现,而且长度乘积最大
: 原帖在这里:
: http://mitbbs.com/article_t/JobHunting/31744763.html
: 感觉没有哪个解法比较好。

i**********e
发帖数: 1145
4
lolhaha 的算法的复杂度是 O(N^2).
应该没有比 O(N^2) 更好的算法了,
但是有一个优化可以把算法运行速度大大提升。
根据 lolhaha 的算法,
1.create hash for each word
2.sort the dictionary by word length.
assume the dictionary become
a[0],a[1]......
find the largest k so that hash(a[0])&hash(a[k])==0,
get max=len(a[0])*len(a[k])
find max i in [2,k-1] so that hash(a[1])&hash(a[i])==0,
compare max and len(a[1])*len(a[i])
next step find max j in [3,i-1]....
Using this online word list:
http://www.sil.org/linguistics/wordlists/english/wordlist/words
(containing total 109562 valid words with characters a-z (deleting 20
invalid words with apostrophe ')),
the run time is 20 seconds using lolhaha's algorithm on my machine.
However, by just adding this condition before the inner loop:
if (a[j].length * a[j].length < max) break; // stop comparing
the run time is reduced to less than 1 second!!!
If you're interested, the answer using the above word list is:
microphotographic defenselessness 255
photomicrographic defenselessness 255
Isn't this interesting?
It is able to reduce the run time by such a large factor because the
dictionary only contains a handful of long words, compared to the majority
words which are shorter. Since the word list is sorted by descending order
in length, the max is likely to be found during the beginning of loop and
could be terminated earlier.
一些常见面试题的答案与总结 -
http://www.ihas1337code.com

【在 c********t 的大作中提到】
: 我觉得lolhaha的算法挺好,就是不知道怎么算复杂度。
s*******t
发帖数: 248
5
then I guess how to design the hash function will be key issue here.
someone suggest use bitset to calculate the hash, however in some word,
there are duplicated characters which maybe an issue.

【在 i**********e 的大作中提到】
: lolhaha 的算法的复杂度是 O(N^2).
: 应该没有比 O(N^2) 更好的算法了,
: 但是有一个优化可以把算法运行速度大大提升。
: 根据 lolhaha 的算法,
: 1.create hash for each word
: 2.sort the dictionary by word length.
: assume the dictionary become
: a[0],a[1]......
: find the largest k so that hash(a[0])&hash(a[k])==0,
: get max=len(a[0])*len(a[k])

i**********e
发帖数: 1145
6
Duplicated characters is not an issue here.
It is only required that characters that appeared in word A do not appear in
word B.
Therefore, assuming a word can contain only characters from 'a' to 'z' all
lowercase, then each character can be mapped to a bit in an integer using
only 26 bits.
If 'a' appeared in a word, bit 1 is set. If 'z' appeared, bit 26 is set.
This integer will form the hash for a word.
To find out if all characters that are in word A are not in word B, just
simply do bit AND operation of their hash. Duplicate characters will not
cause an issue here.
一些常见面试题的答案与总结 -
http://www.ihas1337code.com
s*******t
发帖数: 248
7
Thanks. I understand now. Even two words may have the same hash value, it
doesn't affect the final result.

in

【在 i**********e 的大作中提到】
: Duplicated characters is not an issue here.
: It is only required that characters that appeared in word A do not appear in
: word B.
: Therefore, assuming a word can contain only characters from 'a' to 'z' all
: lowercase, then each character can be mapped to a bit in an integer using
: only 26 bits.
: If 'a' appeared in a word, bit 1 is set. If 'z' appeared, bit 26 is set.
: This integer will form the hash for a word.
: To find out if all characters that are in word A are not in word B, just
: simply do bit AND operation of their hash. Duplicate characters will not

g*********s
发帖数: 1782
8
good observation.
however, how can you ensure this hash is evenly distributed?

in

【在 i**********e 的大作中提到】
: Duplicated characters is not an issue here.
: It is only required that characters that appeared in word A do not appear in
: word B.
: Therefore, assuming a word can contain only characters from 'a' to 'z' all
: lowercase, then each character can be mapped to a bit in an integer using
: only 26 bits.
: If 'a' appeared in a word, bit 1 is set. If 'z' appeared, bit 26 is set.
: This integer will form the hash for a word.
: To find out if all characters that are in word A are not in word B, just
: simply do bit AND operation of their hash. Duplicate characters will not

i**********e
发帖数: 1145
9
Do you mean hash table?
This 'hash' is not a hash table. It is like a signature for a word which
contains just enough information of the set of characters that appeared in
the word.
一些常见面试题的答案与总结 -
http://www.ihas1337code.com

【在 g*********s 的大作中提到】
: good observation.
: however, how can you ensure this hash is evenly distributed?
:
: in

c********t
发帖数: 5706
10
太赞了,我觉得还可以优化的就是,不需要一开始就把所有单词的hashcode算出来储存
。不如用到的时候才算出来存。这样因为a[j].length*a[j].length 用算了。

【在 i**********e 的大作中提到】
: lolhaha 的算法的复杂度是 O(N^2).
: 应该没有比 O(N^2) 更好的算法了,
: 但是有一个优化可以把算法运行速度大大提升。
: 根据 lolhaha 的算法,
: 1.create hash for each word
: 2.sort the dictionary by word length.
: assume the dictionary become
: a[0],a[1]......
: find the largest k so that hash(a[0])&hash(a[k])==0,
: get max=len(a[0])*len(a[k])

z****o
发帖数: 78
11
1. use an integer Mark[i] to represent the character set for word[i] in the
dict; use lower 26 bits, each represent whether a certain character appears
in the word.
(Mark[i] & Mark[j]) == 0 means word[i] and
word[j] are disjoint.
2. at the mean time of calculating the Mark[i], we update the maximum-length
word index in
Longest[(~Mark[i]) & 0x03ffffff];
this means eg.
if word[i] = "abcdefg........uvwabcdefg.....uvw", Longest[(~
Mark[i]) & 0x03ffffff] = i :
the longest word contains and only contains {a,b,c,....,u,v,w} is
word[i] with length 46.
3. we modify the definition of Longest[mark] from
"longest word contains and only contains all unmarked
characters" to
"longest word contains at most all unmarked characters".
To modify the values for Longest[mark], we update Longest[ mark] using
all the Longest[ mark' ] values, where:
mark' = turn one 1 in mark into 0;
if we update the Longest[ mark] using "number of 1s in mark" as order,
increasingly, every Longest[mark] will only be recalculated once.
4. using queue to recalculate every "updated" Longest[mark], instead of
recalculate every Longest[mark] can sometimes dramatically reduce the
calculation. But complexity keeps the same.
5. For complexity: 2^26*26+O(n*m), which is not a good bit-O representation
but more useful.
6. I wrote it just practicing to explain something in English. It sucks to
me.
1 (共1页)
进入JobHunting版参与讨论
相关主题
longest subarray with numbers arranged as a seq两道简单的面试题
被recruiter问到的2个基础题几道微软面试题
(已解决,code错了) online judge 有的时候会有点小bug吗?longest word made of other words
Leetcode- Longest Substring Without Repeating Characters 的 test case一道C/C++的面试题
一个google面试题一道面试题
贴个G家的电面题目吧问一道关于字符串的面试题
请教个面试题MS intern 电面被拒,附上面试过程
facebook面试题Google的一道面试题
相关话题的讨论汇总
话题: word话题: longest话题: mark话题: hash话题: characters