d******g 发帖数: 34 | 1 1. Numbers are generated randomly and passed to a method. Write a program to find the median value as new numbers are generated.
I told the interviewer about the method using two heaps. He doesn't like it (said it could be simpler), and asked for another way to do it. I gave up...
2. Find the median value for an array of values (without sorting it)
Thank you for your insights! | g*********s 发帖数: 1782 | 2
program to find the median value as new numbers are generated.
it (said it could be simpler), and asked for another way to do it. I gave
up...
Do you mean you will have a randomly generated integer stream and you need
to update the median dynamically along the stream? How does your 2-heap
solution work?
【在 d******g 的大作中提到】 : 1. Numbers are generated randomly and passed to a method. Write a program to find the median value as new numbers are generated. : I told the interviewer about the method using two heaps. He doesn't like it (said it could be simpler), and asked for another way to do it. I gave up... : 2. Find the median value for an array of values (without sorting it) : Thank you for your insights!
| h***n 发帖数: 276 | 3 according to your description, I think the solution of using two heaps would
be:
maintain a min-heap & a max-heap, which satisfying that
1) all numbers in min-heap are larger or equal than that of max-heap
2) the difference of the numbers in two heaps should be no more than 1, depe
nding on that the total number of numbers is odd or even
with the above property, the median can be easily derived from top elements
on heap(s)
therefore, the algorithm is designed to maintain the above two properties:
1) whenever there is a new number generated, insert that number to proper he
ap, say if the number is larger than the top element of min-heap, it is inse
rted to the min-heap; or if the number is less than the top element of max-h
eap, it is inserted to the max-heap
2) then calculate the number of numbers of those two heaps, if the number of
one heap is larger than that of the other heap, extract the top element of
the heap with more numbers and then insert that to the heap with less number
s
3) calculate the median
-if numbers in two heaps are equal, the median = average of the top elements
of two heaps
-if numbers in two heaps are not equal , the median = the top element of the
heap with more numbers
another solution would be order-statistic tree, an augmented data structure
from red-black tree where each node x store the number of node in the subtre
e rooted at node x (refer to CLRS chap 14), however, I do not think it is pr
oper to code over the phone
any other simpler solution?
【在 g*********s 的大作中提到】 : : program to find the median value as new numbers are generated. : it (said it could be simpler), and asked for another way to do it. I gave : up... : Do you mean you will have a randomly generated integer stream and you need : to update the median dynamically along the stream? How does your 2-heap : solution work?
| h***n 发帖数: 276 | 4 a simpler solution should be just use 2)'s solution to answer 1)
i think the interview tries to make the thing simple, we think too complicat
e, maybe two heap solution is good for sliding window of numbers
to find the median value as new numbers are generated.
it (said it could be simpler), and asked for another way to do it. I gave up
...
【在 d******g 的大作中提到】 : 1. Numbers are generated randomly and passed to a method. Write a program to find the median value as new numbers are generated. : I told the interviewer about the method using two heaps. He doesn't like it (said it could be simpler), and asked for another way to do it. I gave up... : 2. Find the median value for an array of values (without sorting it) : Thank you for your insights!
| b*****e 发帖数: 474 | 5 1. build a binary search tree. Node contains the size of the tree under it.
O(log(n)) average. (Since input is random, depth should be O(log(n))
expected.
2. quick select is one way. O(N) average, O(N^2) worst case. Or build BST;
use a heap, etc
to find the median value as new numbers are generated.
it (said it could be simpler), and asked for another way to do it. I gave up
...
【在 d******g 的大作中提到】 : 1. Numbers are generated randomly and passed to a method. Write a program to find the median value as new numbers are generated. : I told the interviewer about the method using two heaps. He doesn't like it (said it could be simpler), and asked for another way to do it. I gave up... : 2. Find the median value for an array of values (without sorting it) : Thank you for your insights!
| y****i 发帖数: 312 | | g*********s 发帖数: 1782 | 7 i think the 2-heap solutions is just sigma(O(lgK)) which implies O(NlgN)?
However, linear selection is sigma(O(K)) i.e. O(N^2).
complicat
gave up
【在 h***n 的大作中提到】 : a simpler solution should be just use 2)'s solution to answer 1) : i think the interview tries to make the thing simple, we think too complicat : e, maybe two heap solution is good for sliding window of numbers : : to find the median value as new numbers are generated. : it (said it could be simpler), and asked for another way to do it. I gave up : ...
| d******g 发帖数: 34 | 8 Nice description of the two heap method. This is exactly what i told the
interviewer, although he is not big fan of it:(
He suggested it's not necessary to keep the order of all numbers (which is
what the two heap method is doing). One should be able to update the median
using the info from the previous selection, when a new number arrives.
BTW, he used the example of computing the moving average of an array of
numbers as a hint: When calculate moving average of the next n consecutive
numbers, you want to reuse the previous computed moving average. Just
multiply that average by m, minus first number and add a new number, then
divide by m again.
However I don't see how to incorporate this idea into our solution.
complicat
up
【在 h***n 的大作中提到】 : a simpler solution should be just use 2)'s solution to answer 1) : i think the interview tries to make the thing simple, we think too complicat : e, maybe two heap solution is good for sliding window of numbers : : to find the median value as new numbers are generated. : it (said it could be simpler), and asked for another way to do it. I gave up : ...
| l*********r 发帖数: 674 | 9 是不是可以这么做:
1. creat a binary search tree. When a new item comes, insert into the tree。
O(logN)
2. To get the new medium, check if the newly generated value is bigger or
smaller than the previous medium.
- if bigger: the right child of the old medium is the new medium (if null,
then parent)
- if smaller: the left child of the old medium (if null, then parent)
如果偶数个数medium定义成中间两个的mean的话,需要两个指针对应奇偶数的情况,做
法还是跟上面类似。
O(1)
Overall O(logN)
median
【在 d******g 的大作中提到】 : Nice description of the two heap method. This is exactly what i told the : interviewer, although he is not big fan of it:( : He suggested it's not necessary to keep the order of all numbers (which is : what the two heap method is doing). One should be able to update the median : using the info from the previous selection, when a new number arrives. : BTW, he used the example of computing the moving average of an array of : numbers as a hint: When calculate moving average of the next n consecutive : numbers, you want to reuse the previous computed moving average. Just : multiply that average by m, minus first number and add a new number, then : divide by m again.
| g*********s 发帖数: 1782 | 10 it's been proposed by others to use order statistic tree, i.e., a
balanced bst augmented with size of sub-tree.
this is a classical data structure and most likely the expected answer
by the interviewer. it is easier to understand and stronger in sense of
picking up any n-th element in O(lgN). the complexity is the same as ur
solution though.
ur solution is still interesting to read.
the
(which is
median
of
consecutive
then
【在 d******g 的大作中提到】 : Nice description of the two heap method. This is exactly what i told the : interviewer, although he is not big fan of it:( : He suggested it's not necessary to keep the order of all numbers (which is : what the two heap method is doing). One should be able to update the median : using the info from the previous selection, when a new number arrives. : BTW, he used the example of computing the moving average of an array of : numbers as a hint: When calculate moving average of the next n consecutive : numbers, you want to reuse the previous computed moving average. Just : multiply that average by m, minus first number and add a new number, then : divide by m again.
| | | h***n 发帖数: 276 | 11 I kind of see his hint, the main point to use of the previous result
suppose you use the select algorithm to find the medium, when the next new n
umber comes, you could put it on either the left or right, depending on how
many numbers on each side, you can use select algorithm on the proper part f
or the new medium of the set of numbers
median
【在 d******g 的大作中提到】 : Nice description of the two heap method. This is exactly what i told the : interviewer, although he is not big fan of it:( : He suggested it's not necessary to keep the order of all numbers (which is : what the two heap method is doing). One should be able to update the median : using the info from the previous selection, when a new number arrives. : BTW, he used the example of computing the moving average of an array of : numbers as a hint: When calculate moving average of the next n consecutive : numbers, you want to reuse the previous computed moving average. Just : multiply that average by m, minus first number and add a new number, then : divide by m again.
| d******g 发帖数: 34 | 12 I guess you are right. But this is O(N) complexity rather than O(logN) by
using the max and min heap method.
n
how
f
【在 h***n 的大作中提到】 : I kind of see his hint, the main point to use of the previous result : suppose you use the select algorithm to find the medium, when the next new n : umber comes, you could put it on either the left or right, depending on how : many numbers on each side, you can use select algorithm on the proper part f : or the new medium of the set of numbers : : median
| J********a 发帖数: 5208 | 13 he is looking for using randomized quicksort. (not full sort, just use the
partition routine)
just use inplace quicksort, use the new value as pivot value and while the
final placement is not in the middle, call the partition routine on the
larger portion.
This should be as fast as the heap method, but no additional space needed.
For more information you can watch MIT opencourse for algorithm
Lecture 4: Quicksort, Randomized Algorithms
and
Lecture 6: Order Statistics, Median
median
【在 d******g 的大作中提到】 : Nice description of the two heap method. This is exactly what i told the : interviewer, although he is not big fan of it:( : He suggested it's not necessary to keep the order of all numbers (which is : what the two heap method is doing). One should be able to update the median : using the info from the previous selection, when a new number arrives. : BTW, he used the example of computing the moving average of an array of : numbers as a hint: When calculate moving average of the next n consecutive : numbers, you want to reuse the previous computed moving average. Just : multiply that average by m, minus first number and add a new number, then : divide by m again.
| u******e 发帖数: 758 | 14 for the first problem
my idea is to simply use an ordered linked list with both next and prev poin
t on each node.
Remember the pointer to the current median value and the size of the list. w
henever there is a new value, just put the new value to correct position in
the list(search can start from current median value) and move the median val
ue point one next or prev depending on the size of the list and the position
of new value.
would
depe
elements
he
【在 h***n 的大作中提到】 : according to your description, I think the solution of using two heaps would : be: : maintain a min-heap & a max-heap, which satisfying that : 1) all numbers in min-heap are larger or equal than that of max-heap : 2) the difference of the numbers in two heaps should be no more than 1, depe : nding on that the total number of numbers is odd or even : with the above property, the median can be easily derived from top elements : on heap(s) : therefore, the algorithm is designed to maintain the above two properties: : 1) whenever there is a new number generated, insert that number to proper he
| y**i 发帖数: 1112 | 15 我的想法和你的有类似的地方。
首先你这里有一个地方是不是错了:如果新的随机数数比旧的中位数大,旧的右孩子(
right child)不应该是新的中位数,应该是旧的后继(successor),这样才是旧的中位
数的右子树的最小值。
你这个的确能达到O(logN),不过需要额外O(N)空间,对吧?我的想法是in place用
quick sort的random partition部分,用旧的中位数作为轴,同样假设新的随机数比旧
的中位数大,那么轴右边的较大部分的数加上新的随机数用普通的选择最小值的方法选
出的最小值就应该是新的中位数,时间O(N),空间O(1)。
。
【在 l*********r 的大作中提到】 : 是不是可以这么做: : 1. creat a binary search tree. When a new item comes, insert into the tree。 : O(logN) : 2. To get the new medium, check if the newly generated value is bigger or : smaller than the previous medium. : - if bigger: the right child of the old medium is the new medium (if null, : then parent) : - if smaller: the left child of the old medium (if null, then parent) : 如果偶数个数medium定义成中间两个的mean的话,需要两个指针对应奇偶数的情况,做 : 法还是跟上面类似。
| l**i 发帖数: 8 | 16 我觉得第一题不需要建立所有数的数据结构,其实只需要记录当前median,median的
sucessor和predecessor,每次来一个新数的时候,新的median只可能是那三个数中的
一个,每次根据大小情况以及目前总数奇偶update那三个数就可以了。 | l*********3 发帖数: 26 | 17 我想可以这样做:
1)
定义一个堆栈类,push(),pop(),size(),max/min()。以上操作都可以在O(1)完成
维持两个堆栈,一个max栈,一个min栈。max栈中记录所有比median小的数,min栈中记录所有比median大的数。
接受新数时,决定插入max栈或是min栈,如果max和min栈不平衡,rebalance,pop多的栈,作为新median,将老median插入少的栈中。
每个元素都可以在O(1)时间内完成操作。
2)
输入数组,一次调用第一题中的方法,时间复杂度O(n)。
to find the median value as new numbers are generated.
it (said it could be simpler), and asked for another way to do it. I gave up
...
【在 d******g 的大作中提到】 : 1. Numbers are generated randomly and passed to a method. Write a program to find the median value as new numbers are generated. : I told the interviewer about the method using two heaps. He doesn't like it (said it could be simpler), and asked for another way to do it. I gave up... : 2. Find the median value for an array of values (without sorting it) : Thank you for your insights!
| c**m 发帖数: 535 | 18 这个题目就是online median selection, 我觉得two priority queues (min-heap &
max-heap)应该是目前比较理想的办法了,而且也好理解。
hopen的解释很详细了。
对于题目二,如果单纯一次性找到一个array的mean,并且不用sorting的话,可以用
kth element selection method。但这方法又不能用在data stream上。
面试官的提示很奇怪呀,莫非他只是让你找mean,而不是median?
Anyway, bless~~ | z****o 发帖数: 78 | 19 这个很明显不对啊. 一个反例: 5 6 7 8 9 4 3 2 1 0
【在 l**i 的大作中提到】 : 我觉得第一题不需要建立所有数的数据结构,其实只需要记录当前median,median的 : sucessor和predecessor,每次来一个新数的时候,新的median只可能是那三个数中的 : 一个,每次根据大小情况以及目前总数奇偶update那三个数就可以了。
| z****o 发帖数: 78 | 20 1的堆栈是无序的, pop多的那个出来的东西不一定是新的median
栈中记录所有比median大的
数。
pop多的栈,作为新median,将
老median插入少的栈中。
up
【在 l*********3 的大作中提到】 : 我想可以这样做: : 1) : 定义一个堆栈类,push(),pop(),size(),max/min()。以上操作都可以在O(1)完成 : 维持两个堆栈,一个max栈,一个min栈。max栈中记录所有比median小的数,min栈中记录所有比median大的数。 : 接受新数时,决定插入max栈或是min栈,如果max和min栈不平衡,rebalance,pop多的栈,作为新median,将老median插入少的栈中。 : 每个元素都可以在O(1)时间内完成操作。 : 2) : 输入数组,一次调用第一题中的方法,时间复杂度O(n)。 : : to find the median value as new numbers are generated.
|
|