g*****u 发帖数: 298 | 1 In our indexes, we have millions of URLs each of which has a link to some
page contents, that is, URL->contents. Now, suppose a user types a query
with wild cards *, which represent 0 or multiple occurrences of any
characters, how do you build the indexes such that such a type of query can
be executed efficiently by finding all corresponding URLs->contents
efficiently. For example, given a query http://www.*o*ve*ou.com. You need to find iloveyou.com, itveabcu.com, etc. I got stuck for this probl | c*****t 发帖数: 1879 | 2 What kind of job is it for?
The key would certainly be URL, and value is content. There are
plenty of key-value pair databases. So it is not a problem.
As for the indexing, I don't think that there exists one for regex
or such. So, essentially it is a full table scan of the keys.
There could be some optimizations in accessing the B-Tree of keys
(like if the prefix is known, then skip a range of B-Tree nodes),
but don't think it is that critical.
It can be speed up by parition the table across
【在 g*****u 的大作中提到】 : In our indexes, we have millions of URLs each of which has a link to some : page contents, that is, URL->contents. Now, suppose a user types a query : with wild cards *, which represent 0 or multiple occurrences of any : characters, how do you build the indexes such that such a type of query can : be executed efficiently by finding all corresponding URLs->contents : efficiently. For example, given a query http://www.*o*ve*ou.com. You need to find iloveyou.com, itveabcu.com, etc. I got stuck for this probl
| w***g 发帖数: 5958 | 3 这个问题没什么好的解法。给你一个www.*.com,结果就有那么多,什么算法都快不了。
can
【在 g*****u 的大作中提到】 : In our indexes, we have millions of URLs each of which has a link to some : page contents, that is, URL->contents. Now, suppose a user types a query : with wild cards *, which represent 0 or multiple occurrences of any : characters, how do you build the indexes such that such a type of query can : be executed efficiently by finding all corresponding URLs->contents : efficiently. For example, given a query http://www.*o*ve*ou.com. You need to find iloveyou.com, itveabcu.com, etc. I got stuck for this probl
| g*****u 发帖数: 298 | 4 谢谢大家。这是mS SDE的面试题,我从网上抄来的。
看了一下,大概提到的有k-gram,suffix tree什么的。 |
|