o******0 发帖数: 105 | 1 Design a data structure that supports kind of full text search but in
numbers.
We are given file with lot of 10-digits numbers, for example:
1234 567 890
4124 123 123
3123 123 322
On a given number X we should return all numbers that contain X.
For example, if the number 123 was given, we should return all numbers (from
the list above) because 123 is in all of them.
If the number 41 was given we should return only the middle number - because
the number 41 is only in it.
能想到的是转成string,再search. |
S**********n 发帖数: 250 | 2 4124 123 123 与 123
4124 123 123 / 10^7 % 1000 = 412 != 123
4124 123 123 / 10^6 % 1000 = 124 != 123
4124 123 123 / 10^5 % 1000 = 241 != 123
4124 123 123 / 10^4 % 1000 = 412 != 123
4124 123 123 / 10^3 % 1000 = 123 == 123 YES
4124 123 123 / 10^2 % 1000 = 231 != 123
4124 123 123 / 10^1 % 1000 = 312 != 123
4124 123 123 / 10^0 % 1000 = 123 == 123 YES |
m*******s 发帖数: 9 | 3 用reverted index
from
because
【在 o******0 的大作中提到】 : Design a data structure that supports kind of full text search but in : numbers. : We are given file with lot of 10-digits numbers, for example: : 1234 567 890 : 4124 123 123 : 3123 123 322 : On a given number X we should return all numbers that contain X. : For example, if the number 123 was given, we should return all numbers (from : the list above) because 123 is in all of them. : If the number 41 was given we should return only the middle number - because
|
b******b 发帖数: 713 | 4 seems standard application of use suffix tree.
from
because
【在 o******0 的大作中提到】 : Design a data structure that supports kind of full text search but in : numbers. : We are given file with lot of 10-digits numbers, for example: : 1234 567 890 : 4124 123 123 : 3123 123 322 : On a given number X we should return all numbers that contain X. : For example, if the number 123 was given, we should return all numbers (from : the list above) because 123 is in all of them. : If the number 41 was given we should return only the middle number - because
|
c*******t 发帖数: 123 | 5 brilliant!
【在 b******b 的大作中提到】 : seems standard application of use suffix tree. : : from : because
|
r*******g 发帖数: 1335 | 6 牛b,面试遇到suffix tree基本上就准备直接挂,看wiki也感觉好难。
【在 b******b 的大作中提到】 : seems standard application of use suffix tree. : : from : because
|
m*******s 发帖数: 9 | 7 看了suffix tree, 但是仍然觉得不是最好的解决方法. 比如找出含有123的数字, 仍然
需要先查找123所在的边, 然后再找数字. 这样和一个一个数字扫描没有太大区别.
【在 b******b 的大作中提到】 : seems standard application of use suffix tree. : : from : because
|
x*******9 发帖数: 138 | 8 想了个土方法
对于needle为一位或两位的情况,我们用个hashmap预处理出来。
然后将一个10位数拆成8个三位数
for example:
1 234 567 890 => (123) (234) (345) ...(890)
然后建倒排索引。
接下来的每一次查找,对于needle为一位或两位的情况,直接查表来做。
对于needle.length() >= 3的情况,我们用同样的方法将needle进行分拆,拆成多个
key。
之后用这些key拉出倒排链,做一次merge操作。在筛选过后,再暴力检查一下是否真含
有这个子串。
如果数据是随机的,这种方法应该不差。 |
v**N 发帖数: 11 | |
r*******g 发帖数: 1335 | 10 这道题应该没有好办法吧,suffix tree很难写的。 |
|
|
h*****u 发帖数: 109 | 11 I feel it is still invert index.
Each string has an ID.
Each sub-string maps to a set of raw string IDs.
For example,
Let 3123 123 322 have ID C.
3 -> C
31 -> C
312 -> C
...
Finally combine them, say
3 -> {A, B, C}
312 -> {B, C}
...
For each string, n^2 combinations, or words. M strings, M n^2 total words, n
as average length. |
y***x 发帖数: 148 | 12 最直接的难道不是正则表达式
[在 ocean100 (ocean100) 的大作中提到:]
:Design a data structure that supports kind of full text search but in
:numbers.
:........... |
y***x 发帖数: 148 | 13 最直接的难道不是正则表达式
[在 ocean100 (ocean100) 的大作中提到:]
:Design a data structure that supports kind of full text search but in
:numbers.
:........... |
h*****u 发帖数: 109 | 14 So, replace Google by "ctrl-F". |
K******r 发帖数: 4052 | 15 用String开销很大吗?
★ 发自iPhone App: ChineseWeb 11
【在 y***x 的大作中提到】 : 最直接的难道不是正则表达式 : [在 ocean100 (ocean100) 的大作中提到:] : :Design a data structure that supports kind of full text search but in : :numbers. : :...........
|
r*******g 发帖数: 1335 | 16 n^2开销还不如直接建一个suffix tree了,也是n*m, n是字符串个数,m是字符串长度。
【在 h*****u 的大作中提到】 : I feel it is still invert index. : Each string has an ID. : Each sub-string maps to a set of raw string IDs. : For example, : Let 3123 123 322 have ID C. : 3 -> C : 31 -> C : 312 -> C : ... : Finally combine them, say
|