n*******s 发帖数: 482 | 1 Given a word stream, get the N words with top N highest occurance frequancy
(dynamically)
Any good approaches? Question is very open, try to explore the best space
and time complexity. | n*******s 发帖数: 482 | 2 Sorry, in office, can'te type chinese , just bare with my broken english
i would say
using a Hashmap to maintain the frequence for all words.
Value is defined as this
struct Node{
int key
int counter
struct Node* next;
};
struct V{
int counter;
struct Node* pt;
};
Also maitain a sorted list using Node as linked list (Olnly allow N nodes in the
linked list, the top N frequent words). The linked list is sorted.
When processing a word, first udpate the HashMap,
than, based on the V.counter to see if
(1) V.pt is already in the linked list, see if need to move the Node via to
a new place (since the counter is increaetd by 1, and we need to keep the
linked list sorted.)
(2) V.pt is none, and LinkedList is not full (<=N). Malloc a new Node and
linked to the sorted linked List
(3) V.pt is none and Linked List is full
if V.counter is greater than the Smallest Node in the Sorted LL
free the smallest Node and insert the a new Node with V's info.
update V.pt.
Worst Time Cost for each word processing is
O(1) --> hashMap lookup
O(N) --> maintain the sorted LL
O(1) --> to update the Node's Pointer in the hashmap, need another
lookup
But a lot of time is wasted on malloc/free.
i am also considering the change the LinkedList into a Mini-Heap. but the
heap operation will involve a lot of index udpate...i am a little hesitate
to do that.
|
|