z**********g 发帖数: 209 | 1 讨论一下如何design google suggest. 抛砖引玉。
我想基本的structure应该是trie + minheap.
First, we need to assume that popular search phases' search frequencies are
known. I think this can be achieved by map reduce easily, and the result is
stored at a database called DB.
Then, build a trie according to those popular search phases. Each node in
the trie has a pointer to a minHeap. The minHeap has a fixed size of 10.
Each node in minHeap stores a word and its frequency.
Suppose we want to insert a word craigslist into the trie, we need to update
every node's minHeap along the path, namely c, cr, cra, crai, craig, craigs
, craigsl, cragisli, cragislis and craigslist.
Please note this approach builds the trie according to DB statically,
and I think we need to reconstruct a new trie according to new DB every
month. |
|