C*********o 发帖数: 7 | 1 向大家请教一道题目:
Given a list of one million
score> pairs where names are valid Java variable
names, write two programs and try to optimize their
efficiency:
1. A Construction Program that produces an index
structure D.
2. A Query Server Program that reads in serialized D
and then accepts user queries such that for each
query s, it responds with the top 10 names (ranked
by score) that start with s or contains ‘_s’ (so for
example, both “revenue” and “yearly_revenue”
match the prefix “rev”). Query answering should run
in sub-linear time (in terms of the number of names
in the input).
我自己的想法是可以根据name 构建一个trie (prefix tree) 去存放每一个pair, 之
后取出然后排序取前10个。
问题1: 如何处理包含“_s" 的pair
问题2: 有没有其他方法更加好?
我自知才疏学浅,恳请大家指教,谢谢~ | g*****g 发帖数: 212 | 2 要先分词再trie
【在 C*********o 的大作中提到】 : 向大家请教一道题目: : Given a list of one million : score> pairs where names are valid Java variable : names, write two programs and try to optimize their : efficiency: : 1. A Construction Program that produces an index : structure D. : 2. A Query Server Program that reads in serialized D : and then accepts user queries such that for each : query s, it responds with the top 10 names (ranked
| m*****n 发帖数: 2152 | 3 用分层的trie?
比如s, s_s, s_s_s,可以先构建三层, 0,1,2,分别指代没有underscore,一个
underscore,两个underscore。考虑到扩展性,用linked list来实现比较好。
比如0->1->2->3....,然后每一层用trie,每一层的trie的末尾如果不是underscore就
用NULL,如果是underscore,就给指向下一层的首字母的地址。
如果输入某个或某几个字母,就逐层扫描,加到maxheap里去。
具体实现就是定义两种node,key node和ordinary node,key node 只用于第一个字母
和underscore后面的第一个字母,key node除了有ordinary node的所有atributes以外
还要
一个额外的pointer指向下一个key node。 | l*****a 发帖数: 14598 | 4 when u create the trie, add its reverse string into Trie as well.
then you can get those end with _s
【在 C*********o 的大作中提到】 : 向大家请教一道题目: : Given a list of one million : score> pairs where names are valid Java variable : names, write two programs and try to optimize their : efficiency: : 1. A Construction Program that produces an index : structure D. : 2. A Query Server Program that reads in serialized D : and then accepts user queries such that for each : query s, it responds with the top 10 names (ranked
|
|