由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - imputation question?thanks
相关主题
这个是什么model模拟出来的,用R做的proc logistic遇到missing value怎么处理
missing values imputation求 imputation 后 出来的iteration 的数据作用
面试时关于如何处理missing data的回答问个missing data的问题,关于time series data
真心请教: data cleaning请教一个sas问题
大家平时怎么处理missing data?如何处理这样的missing value?
Can normally distributed time series data are autocorrelated? Thanks.请问如何处理RCBD 中 missing data.小样本
如何把model fitting statistics 读出来(R)对于Mixed Linear Model, 如何处理missing covariates?
[合集] Missing data[Q]One method with missing value
相关话题的讨论汇总
话题: r3话题: y1话题: y2话题: delta话题: y3
进入Statistics版参与讨论
1 (共1页)
c**********5
发帖数: 653
1
Hi,Everyone,
I am new with this topic.Can anybody help me out?
in the pilot study there were around 100 sample size ,almost half of the
them carry missing value.
I would like to use the multiple imputation to deal with the missing data
problem.
The current model is :
Outcome1(post measurement1-premeasure1)=pre measurement1+group
Outcome2(post measurement2-premeasure2)=pre measurement2+group
…….
There are a lot outcomes We are interested.
I have the following question:
1. How can I build the imputation model? Which variables should I include
in the imputation model in my case?( dependent variable and independent
variable...and others..)
Notes: Missing data are not only within the outcome but also in the
independent variable(very small portion) here
2. how many imputation times do you recomended?(usually,5-10,however,if the
proportion of the missing value is huge,maybe we need more imputation times
(50))???
Thanks.
c**********5
发帖数: 653
2
ding
w******a
发帖数: 25
3
Here is an R example to impute one missing data in each record,half of the code is to make data sample, you probably only need second half,but including them here helps you understand what is going on:
The data will look like
col1 col2
x
x x
x
x x
x x
...
library(Rlab)
alp = 1
Prob_R1 = 0.5
Prob_R0 = 1 - Prob_R1
len_Y1 = 200
K_delta = 2
Y1 = rnorm(len_Y1,mean=0,sd=1)
R1 = rbinom(n=len_Y1, size=1, prob=Prob_R1)
Y2 = rnorm(n=len_Y1, mean=alp*Y1, sd=1)
Y2[R1==0] = NA
data = data.frame(cbind(Y1,Y2))
reg = glm(Y2~Y1,family=gaussian,data)
sigma = sd(reg$residuals)
delta_grid = K_delta * (-2:2/2) # interval from -K_
delta to K_delta
delta = sigma * delta_grid # interval from -K*
sigma to K*sigma
E_Y2 = NULL
for(i in 1:length(delta))
{
Y2[R1==0] = NA
Y2.pred = delta[i] + predict(reg,newdata=data)
Y2[R1==0] = 0
Y2.hat = Y2*R1 + Y2.pred*(1-R1)

par(mfrow=c(1,2))
plot(Y1[R1==1],Y2[R1==1])
points(Y1[R1==0],Y2.hat[R1==0],pch="+")

hist(Y2.hat, xlim=c(-4,4))
E_Y2[i] = mean(Y2.hat)
}
par(mfrow=c(1,1))
plot(delta,E_Y2)
#lm(formula = E_Y2 ~ delta) E_Y2=0.06531+0.54500*delta
w******a
发帖数: 25
4
Here is an R example to impute one or two missing data in each record:
The data will look like
col1 col2 col3
x
x x x
x x
x x
x x x
x
x x x
...
library(Rlab)
alp = 1
K_delta = 2
len_Y1 = 200
#Sample setting:
#Measurment N_
patient Percent
# 1 12
0.18
# 1 2 4
0.05
# 1 2 3 22
0.78
#Convert the above info into missing rate:
#N_measurement 1
2 3
#Occupy_rate 0.78+0
.05+0.18 0.78+0.05 0.78
#Missing_rate 1-(0.
78+0.05+0.18) 1-(0.78+0.05) 1-0.78

#missing rate for each measurement at time points 1,2,3
Prob_R1 = 0
Prob_R2 = 1-0.78-0.05
Prob_R3 = 1-0.78
#measurements at time points 1,2,3
Y1 = rnorm(n=len_Y1, mean=0,sd=1)
Y2 = rnorm(n=len_Y1, mean=alp*Y1, sd=1)
# mean(Y2)=-0.03, sum(Y1)/200=0.024
Y3 = rnorm(n=len_Y1, mean=alp*Y1, sd=1)
#R:response indicator 1=observed;0=missing
R1 = rep(1,len_Y1)
R2 = rbinom(n=len_Y1, size=1, prob=1-Prob_R2)

R3 = rbinom(n=len_Y1, size=1, prob=1-(Prob_R3-Prob_R2))
Y2[R2==0] = NA
R3[R2==0] = 0
Y3[R3==0] = NA
data = data.frame(cbind(Y1,Y2,Y3,R1,R2,R3))
#Estimate Y2
reg = glm(Y2~Y1,family=gaussian,data)
sigma = sd(reg$residuals)
delta_grid = K_delta * (-2:2/2) # interval
from -K_delta to K_delta
delta = sigma * delta_grid # interval
from -K*sigma to K*sigma
E_Y2 = NULL
par(mfrow=c(4,3))
for(i in 1:length(delta))
{
Y2[R2==0] = NA
Y2.pred = delta[i] + predict(reg,newdata=data)
Y2[R2==0] = 0
Y2.hat = Y2*R2 + Y2.pred*(1-R2)

#par(mfrow=c(1,2))
plot(Y1[R2==1],Y2[R2==1])
points(Y1[R2==0],Y2.hat[R2==0],pch="+",col="red")

hist(Y2.hat, xlim=c(-4,4))
E_Y2[i] = mean(Y2.hat)
}
par(mfrow=c(1,1))
plot(delta,E_Y2)
#Estimate Y3
reg2 = glm(Y3~Y1+Y2.hat,family=gaussian,data)
sigma2 = sd(reg2$residuals)
delta_grid = K_delta * (-2:2/2) # interval
from -K_delta to K_delta
delta = sigma2 * delta_grid # interval
from -K*sigma to K*sigma
E_Y3 = NULL
par(mfrow=c(4,5))
for(i in 1:length(delta))
{
Y3[R3==0] = NA
Y3.pred = delta[i] + predict(reg2,newdata=data)
Y3[R3==0] = 0
Y3.hat = Y3*R3 + Y3.pred*(1-R3)

plot(Y1[R3==1],Y3[R3==1])
points(Y1[R3==0],Y3.hat[R3==0],pch="+",col="red")
}
for(i in 1:length(delta))
{
Y3[R3==0] = NA
Y3.pred = delta[i] + predict(reg2,newdata=data)
Y3[R3==0] = 0
Y3.hat = Y3*R3 + Y3.pred*(1-R3)

plot(Y2.hat[R3==1],Y3[R3==1])
points(Y2.hat[R3==0],Y3.hat[R3==0],pch="+",col="red")
}
for(i in 1:length(delta))
{
Y3[R3==0] = NA
Y3.pred = delta[i] + predict(reg2,newdata=data)
Y3[R3==0] = 0
Y3.hat = Y3*R3 + Y3.pred*(1-R3)

hist(Y3.hat, xlim=c(-4,4))
E_Y3[i] = mean(Y3.hat)
}
c**********5
发帖数: 653
5
Hi,Thanks a lot.
I fotgot R for a while and I maybe can pick it up.I will study your code
tonight.I am not authrized to install R to my working station.
I know how to write the SAS code using Proc Mi(2 steps).
I am stuggling for the questions above.
d******g
发帖数: 130
6
Not sure if you have read the good post on UCLA's SAS page on this topic.
Here is the link:
http://www.ats.ucla.edu/stat/sas/seminars/missing_data/part1.htm
Hope this helps.

the
data

【在 c**********5 的大作中提到】
: Hi,Everyone,
: I am new with this topic.Can anybody help me out?
: in the pilot study there were around 100 sample size ,almost half of the
: them carry missing value.
: I would like to use the multiple imputation to deal with the missing data
: problem.
: The current model is :
: Outcome1(post measurement1-premeasure1)=pre measurement1+group
: Outcome2(post measurement2-premeasure2)=pre measurement2+group
: …….

c**********5
发帖数: 653
7
Hi,
Thanks.I have read it and it is my favorite web.不过还是好谢谢你。
我从来没有用过这个方法,读完一些资料以后,感觉是如果是任意missing模式,当
我们建立imputation model时,我们可以将所有与你感兴趣的变量放入这个model,不
管是dependent variable 还是indpendent variable。不知我理解的对不对。谢谢
H******r
发帖数: 2879
8
Almost all existing imputation methods are based on MAR assumption - think
about whether this assumption is true in your problem.
Imputation model could be a "big" model, which includes all "useful"
predictors and some "useless" predictors. 10 multiply-imputed datasets
should be enough.
You may check IVEware for MI - it works for non-normal model and you can
specify bounds as well.
1 (共1页)
进入Statistics版参与讨论
相关主题
[Q]One method with missing value大家平时怎么处理missing data?
SAS help needed, interpolating missing valuesCan normally distributed time series data are autocorrelated? Thanks.
"Missing data" "intent-to-treat" "repeated measure"如何把model fitting statistics 读出来(R)
关于 Risk model[合集] Missing data
这个是什么model模拟出来的,用R做的proc logistic遇到missing value怎么处理
missing values imputation求 imputation 后 出来的iteration 的数据作用
面试时关于如何处理missing data的回答问个missing data的问题,关于time series data
真心请教: data cleaning请教一个sas问题
相关话题的讨论汇总
话题: r3话题: y1话题: y2话题: delta话题: y3