y*****w 发帖数: 1350 | 1 某种index,是个non-negative continuous variable,但是有很多的零值,使得
distribution very skewed。因为有零值,没法直接作log transformation,如果用0.
000001代替0以后再作log transformation,因为原先的零值太多,transformation之
后的distribution仍然very skewed。如果用square root transformation,效果要好
一些,但是仍然不理想。还有没有更好的办法? |
s*****n 发帖数: 2174 | 2 没什么generic的方法, 如果是离散分布的, 有zero-inflated models, 不过也未必适
用于一般的数据. 连续数据就更没有什么generic的方法了.
比较naive的方法, 可以把这个问题的实际解释看成两个distribution的乘积或者合并,
一个是binary distribution, 另一个是非零情况下的连续分布, 可以分别model. |
t*********e 发帖数: 71 | 3 如果是count data,可以用zero-inflated POSSION 或者 zero-inflated NEGATIVE
BINOMIAL. 如果是连续数据,要看连续部分的分布了,normal可以尝试Tobit, gamma可
以尝试tweedie,或者hurdle model. |
l******n 发帖数: 9344 | 4 mixture model
0.
【在 y*****w 的大作中提到】 : 某种index,是个non-negative continuous variable,但是有很多的零值,使得 : distribution very skewed。因为有零值,没法直接作log transformation,如果用0. : 000001代替0以后再作log transformation,因为原先的零值太多,transformation之 : 后的distribution仍然very skewed。如果用square root transformation,效果要好 : 一些,但是仍然不理想。还有没有更好的办法?
|
S*x 发帖数: 705 | 5 我有一个不是很statistically sound的解决办法
所有值+1后再做log transformation
0.
【在 y*****w 的大作中提到】 : 某种index,是个non-negative continuous variable,但是有很多的零值,使得 : distribution very skewed。因为有零值,没法直接作log transformation,如果用0. : 000001代替0以后再作log transformation,因为原先的零值太多,transformation之 : 后的distribution仍然very skewed。如果用square root transformation,效果要好 : 一些,但是仍然不理想。还有没有更好的办法?
|
y*****w 发帖数: 1350 | 6 Thanks for all the comments and suggestions. It seems mixture modeling is a
very effective way to handle this type of situation. However, not only do I
need a model for that variable to be an outcome variable, but I also need to
have the variable controlled for as a covariate in another model. In the
latter scenario, I think I still need some way to transform the variable --
is there any good way to do that? One example of the latter scenario is that
I want to create a model for the change in the index from baseline to 18
months, in which the baseline index -- the one with many zero values -- have
to be controlled for as a covariate. |
w*******9 发帖数: 1433 | 7 use both indicator{x=0} and x*indicator{x>0}?
a
I
to
-
that
have
【在 y*****w 的大作中提到】 : Thanks for all the comments and suggestions. It seems mixture modeling is a : very effective way to handle this type of situation. However, not only do I : need a model for that variable to be an outcome variable, but I also need to : have the variable controlled for as a covariate in another model. In the : latter scenario, I think I still need some way to transform the variable -- : is there any good way to do that? One example of the latter scenario is that : I want to create a model for the change in the index from baseline to 18 : months, in which the baseline index -- the one with many zero values -- have : to be controlled for as a covariate.
|
s*********e 发帖数: 1051 | 8 if the positive part is indeed gamma, then full distribution shouldn't be a
mixture.
however, it could be a two-part model, one for point mass at zero and the
other gamma.
【在 l******n 的大作中提到】 : mixture model : : 0.
|
c***z 发帖数: 6348 | 9 We use a two stage approach, i.e. add a dummy for zero/nonzero cases,
together with the original variable. |
w*******9 发帖数: 1433 | 10 I said it first 哈哈,开个玩笑。
【在 c***z 的大作中提到】 : We use a two stage approach, i.e. add a dummy for zero/nonzero cases, : together with the original variable.
|
|
|
l*******s 发帖数: 1258 | 11 smoothing
比如laplace、good turing、kernel、 |
P****D 发帖数: 11146 | 12 re
这个是最快也最容易让客户理解的。你弄那些高级的方法,客户不接受。
【在 S*x 的大作中提到】 : 我有一个不是很statistically sound的解决办法 : 所有值+1后再做log transformation : : 0.
|
s*********e 发帖数: 1051 | 13 那要看你如何解释
【在 P****D 的大作中提到】 : re : 这个是最快也最容易让客户理解的。你弄那些高级的方法,客户不接受。
|
y*****w 发帖数: 1350 | 14 By "a two-stage approach", do you mean a multivariate approach? For example,
if it's a general linear model, would the SAS code look like:
proc glm data=data;
class group;
model original_var zero_dummy = group;
run;
The output has two separate models, one for the dependent variable "original
_var" and the other for the dependent variable "zero_dummy".
【在 c***z 的大作中提到】 : We use a two stage approach, i.e. add a dummy for zero/nonzero cases, : together with the original variable.
|
h***i 发帖数: 3844 | 15 https://files.nyu.edu/mrg217/public/tobit1.pdf
0.
【在 y*****w 的大作中提到】 : 某种index,是个non-negative continuous variable,但是有很多的零值,使得 : distribution very skewed。因为有零值,没法直接作log transformation,如果用0. : 000001代替0以后再作log transformation,因为原先的零值太多,transformation之 : 后的distribution仍然very skewed。如果用square root transformation,效果要好 : 一些,但是仍然不理想。还有没有更好的办法?
|
t*****u 发帖数: 70 | 16 +1 后再 take log,不是又回到了0么?这个意义在哪里?
【在 S*x 的大作中提到】 : 我有一个不是很statistically sound的解决办法 : 所有值+1后再做log transformation : : 0.
|
h***i 发帖数: 3844 | 17 这个不解决问题,就是搞笑的
【在 t*****u 的大作中提到】 : +1 后再 take log,不是又回到了0么?这个意义在哪里?
|