s*****n 发帖数: 162 | 1 Suppose you are developing a search engineer. Given a sequence of
numbers as a query string. You need to design algorithm/data structure such
that it can return whether this number sequence belongs to some famous
number sequences. For example, if the query string is “0 1 1 2 3 5”, it
might belong to a Fibonacci sequence. Note that the given sequence may not
always start from the beginning of a known number sequence.
My idea:
For those "famous" number sequences, we may cut them into smaller sequences
(e.g. three numbers each). Then save these sequences into a database (NoSQL
key-value store). When a query comes in, cut it into smaller sequences with
the same length (such 3) and then look up in the database/key-value store.
Any suggestions? Thx! |
g****y 发帖数: 240 | 2 for each "famous" number sequence, construct a suffix tree for it? |
d****o 发帖数: 1055 | 3 对每一个famous sequence, 建立一个suffix tree.
然后用该string 去遍历suffix tree
such
sequences
NoSQL
with
【在 s*****n 的大作中提到】 : Suppose you are developing a search engineer. Given a sequence of : numbers as a query string. You need to design algorithm/data structure such : that it can return whether this number sequence belongs to some famous : number sequences. For example, if the query string is “0 1 1 2 3 5”, it : might belong to a Fibonacci sequence. Note that the given sequence may not : always start from the beginning of a known number sequence. : My idea: : For those "famous" number sequences, we may cut them into smaller sequences : (e.g. three numbers each). Then save these sequences into a database (NoSQL : key-value store). When a query comes in, cut it into smaller sequences with
|
s*****n 发帖数: 162 | 4 用suffix tree的话,对于这种sequence的suffix,它们不重合,比如
0 1 1 2 3 5 8 13 21 34
suffix
34
21 34
13 21 34
8 13 21 34
5 8 13 21 34
3 5 8 13 21 34
2 3 5 8 13 21 34
1 2 3 5 8 13 21 34
1 1 2 3 5 8 13 21 34
0 1 1 2 3 5 8 13 21 34 |
g****y 发帖数: 240 | 5 那用hashtable +linked list. use linked list to store the sequence. use
hashtable to do lookup. key = number in sequence; value= pointer to the
number in the linked list.
【在 s*****n 的大作中提到】 : 用suffix tree的话,对于这种sequence的suffix,它们不重合,比如 : 0 1 1 2 3 5 8 13 21 34 : suffix : 34 : 21 34 : 13 21 34 : 8 13 21 34 : 5 8 13 21 34 : 3 5 8 13 21 34 : 2 3 5 8 13 21 34
|
o****o 发帖数: 1398 | 6 "famous" number sequence 大都是无穷多个的,怎么办?
【在 s*****n 的大作中提到】 : 用suffix tree的话,对于这种sequence的suffix,它们不重合,比如 : 0 1 1 2 3 5 8 13 21 34 : suffix : 34 : 21 34 : 13 21 34 : 8 13 21 34 : 5 8 13 21 34 : 3 5 8 13 21 34 : 2 3 5 8 13 21 34
|
l***i 发帖数: 1309 | 7 treat the sequence as a string and use KMP to match. You have to cut the
famous sequence at some point. Usually it is not a problem as most sequences
are increasing so you know you can stop when the last integer in your
sequence is smaller than the last one in the prefix of a famous sequence.
oeis is a website to do this search quite efficiently, not sure how they
implemented it. |
b*******d 发帖数: 750 | 8 除了上边说的suffix tree之外,
一个比较粗的想法:
很多“famous” sequence都是指数级增长,所以越界是比较快的(assume,相信几百
万个这样数之后就overflow了,那么存下来也就是几个M而已)。对于很大的数,可以
直接标记是否他们是fabonacii数or not。对比较短的数列,标记几个level的inverted
index,如上边说的3连续,4连续,5连续,6连续。6连续的话就已经是shingle了,算
得上是finger print了,查询可以非常快的。 |