c***z 发帖数: 6348 | 1 Hi all,
I am doing something new to myself and would like to hear your suggestions.
:)
I have a record of how often some page_visit data is missing, per day, for
100 million users, in 15 days. I need to check whether the missing data is
random, to make sure my analysis is not biased. We already know that in
average, received data is less than what we should receive; and that
sometimes we received more than what we should.
Any clue on how to do this?
Thanks a lot! :) | I*****a 发帖数: 5425 | 2 if you have a sample of the missing data, can you compare the distributions
of it with the one that are not missing, e.g. by a KS test.
It may be tricky which random variable you want to use. Maybe the ones you
are most interested of.
.
【在 c***z 的大作中提到】 : Hi all, : I am doing something new to myself and would like to hear your suggestions. : :) : I have a record of how often some page_visit data is missing, per day, for : 100 million users, in 15 days. I need to check whether the missing data is : random, to make sure my analysis is not biased. We already know that in : average, received data is less than what we should receive; and that : sometimes we received more than what we should. : Any clue on how to do this? : Thanks a lot! :)
| I*****a 发帖数: 5425 | 3 and depending on your real problem and missing reasons, some signals may
have different mean/var if missing not at random. In this case comparing
mean/var directly should give you more power.
distributions
【在 I*****a 的大作中提到】 : if you have a sample of the missing data, can you compare the distributions : of it with the one that are not missing, e.g. by a KS test. : It may be tricky which random variable you want to use. Maybe the ones you : are most interested of. : : .
| c***z 发帖数: 6348 | |
|