a***e 发帖数: 413 | 1 自己做的总是memory exceeded,
看了讨论,
https://oj.leetcode.com/discuss/24478/i-did-it-in-10-lines-of-c
还是不懂为啥m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1就能达到rolling hash
的效果呢?
i只是一个char啊,这个是哪里推导出来的?多谢!
Neat idea. The additional 1 bit per letter still encode each substring in
10x3 = 30 bits, just under 4 bytes for a 32-bit integer.
Your code could be further simplified. By observing that s[i] & 7 is never 0
, each of the first nine substrings with length < 10 will have unique hash
key and will never collide with other 10-letter long sequences. Therefore
the first loop could be removed and be compacted into a single loop.
vector findRepeatedDnaSequences(string s) {
unordered_map m;
vector r;
for (int t = 0, i = 0; i < s.size(); i++)
if (m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1)
r.push_back(s.substr(i - 9, 10));
return r;
} | s****a 发帖数: 794 | 2 因为他说ACTG在ASC里面后三位不一样吧
其实你就说A001 C010 T011 G100啥都没有是000 一样的
无非赶得巧省几行代码 | p*******e 发帖数: 4 | 3 你不一定非要用这种方法,一个字母用2个bit表示,就不会memory exceed了
m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1
这里的s[i] & 7 是取下一个字母N,
t << 3 & 0x3FFFFFFF 是保留当前10-letter string的后9个letter S,
| 是把 N 合并到S中,这样就更新了key |
|