D******n 发帖数: 2836 | 1 namely imputation.
Thanks. |
|
s****i 发帖数: 18 | 2 【 以下文字转载自 JobHunting 讨论区 】
发信人: ssissi (sissi), 信区: JobHunting
标 题: Job Opportunity
发信站: BBS 未名空间站 (Thu Feb 3 10:55:08 2011, 美东)
Job Code: 25248249
Job Title: Health Surveillance Specialist
Closing Date/Time: Fri. 02/11/11 11:59 PM Central Time
Job Type: PERM FULL TIME
Location: Lincoln, Nebraska
Requisition #: 201102268
Position #: 25248249
Department: Health & Human Services, Department of
Nebraska State Office Building, Lincoln
Work schedule: 8-5 M-F
Examples of Work:
Obtain and... 阅读全帖 |
|
|
m****n 发帖数: 692 | 4 看SAS手册中的PROC MI和PROC MIANALYZE。很详细,做一般的MI足够。 |
|
c********2 发帖数: 94 | 5 请问怎样回答比较合适,如果解释清楚imputation也挺麻烦的,如何全面而又概括的回
答?
谢谢 |
|
l*******s 发帖数: 437 | 6 小样本,不知道distribution。
这种情况下不能用maximum likelihood 和multiple imputation吧? |
|
l*******s 发帖数: 437 | 7 4个blocks,5 treatments.
做科研时就知道可以用一个公式imputation,真不想统计硕士都快读完了,还是只知道
那个公式。 |
|
|
j******a 发帖数: 194 | 9 如果是Bayesian,用imputation也不方便吧?我目前只是把missing的数据设为单独一
类,不过其实这也是权宜之计 |
|
d***2 发帖数: 341 | 10 First thing first, since the achievement is evaluated by state standard
materials, I think you can assume an achieved 9yrs old is as successful as
an achieved 10yrs old. Therefore, the events are all equal across all ages.
Otherwise, you will have to impute different events to different ranks and a
rank analysis, such as the Wei-Lachin multivariate rank analysis....but I
really don't think you want to go that route.
Now, the real question I see is, do you want to consider all events as
independe... 阅读全帖 |
|
k****i 发帖数: 347 | 11 做multiple imputation的要是说不出来这个方法有什么不好,那真的是不用混了,呵
呵 |
|
A*******s 发帖数: 3942 | 12 imputation is not that practical in real practice. |
|
|
a***r 发帖数: 420 | 14 ?
没啊,不算thesis方向(好遥远~><~),算是preliminary project吧 |
|
t*m 发帖数: 4414 | 15 We still have to phase the haplotype after NGS |
|
s****i 发帖数: 18 | 16 【 以下文字转载自 JobHunting 讨论区 】
发信人: ssissi (sissi), 信区: JobHunting
标 题: Job Opportunity - Statistics
发信站: BBS 未名空间站 (Thu Aug 25 13:06:21 2011, 美东)
http://agency.governmentjobs.com/nebraska/job_bulletin.cfm?JobI
NEBRASKA STATE GOVERNMENT
invites applications for the position of:
Health Surveillance Specialist
An Equal Opportunity Employer
SALARY: $21.35 /Hour
OPENING DATE: 08/24/11
CLOSING DATE: 09/07/11 11:59 PM
JOB TYPE: PERM FULL TIME
LOCATION: Lincoln
AGENCY: Health & Human Servic... 阅读全帖 |
|
a***g 发帖数: 2761 | 17 你可以把去mall和去store看成两个bernoulli分部吧
wave2部分作为complete data 部分
wave1部分作为有条件限制的missing data部分
然后用em算法
不过我觉得我这么想有点想当然,只是一个小建议 |
|
w*******t 发帖数: 364 | 18 mark, seems an imputation question to me.
If I am given the question, I might fit a linear regression line and use the
mean estimate of Y at the corresponding X or vice versa. Stop there or you
may estimate the standard deviation of residual errors and random sample a
error term to add to the mean estimate. |
|
c********d 发帖数: 253 | 19 Use joint model to impute all variables at the same time with a multivariate
normal assumption. The var-cov matrix will reflect the correlation. SAS
proc MI can do that. |
|
J********J 发帖数: 571 | 20 Thanks for the reply. I was asking how to imput "≤", "≥ " as values, not
an expression.
For example; if age >= 50 then agegrp = "Age ≥ 50 ";
Anybody knows? |
|
i****e 发帖数: 46 | 21 1. 没做过第一个,不过要是我有这个concern的话,就给impression做个outlier分析
,去掉那些极端小的size。
2。 要做missing value check,有太多missing的variable要扔掉,其余的要
imputation,有些变量要做transform, convert continuous variable to
categorical/dummy variable。可以做variable clustering。然后 univariate
analysis,然后stepwise logistic regression。 |
|
G*******s 发帖数: 10605 | 22 1.可以考虑imputation,对结果影响不大,如果variable很多missing本身就不适合做最
后scoring formula的, 这种variable我宁可不要
2.Principle Component是个不错的选择
missing
15 |
|
z**********i 发帖数: 12276 | 23 原来读过2片missing data的文章,实在是只知道皮毛. |
|
m******u 发帖数: 277 | 24 LOCF
propensity score
predictive mean matching
^_^ |
|
|
|
s*******e 发帖数: 1385 | 27 这个你要先做一个univariate plot,然后再决定cap, floor和transformation,还有
missing impute。
on |
|
d******9 发帖数: 134 | 28 longitudinal study中,要用MANOVA分析四个treatment groups的某个continuous
outcome (看是否有tx effect, time effect, tx*time interaction). 该outcome
有missing data, 但要求保留所有的records,那么就要用插值。
我用SAS中的PROC MI得出5个插值后的complete datasets, 但是不知道有没有办法用
PROC MIANALYZE combine 对5組数据分别MANOVA的结果(p-value)?如果这是不可行
的,那么我该怎么处理呢?随便挑选5組完整数据里面的一组来做MANOVA, 其他的忽略?
多谢多谢,在线等! |
|
t*******t 发帖数: 633 | 29 这个我真要试试。
我之前一直是写macro来做imputation的 |
|
d********h 发帖数: 2048 | 30 俺是觉得两个都很好玩,乐在其中。data manipulation,总是琢磨着怎样提高效率,
产生更漂亮的表格,图表;怎样改进目前的模型,比如怎样尝试改进现有的imputation
,能不能用Bayesian 处理missing data。怎样用group based model解决population的
heterogeneous问题。 |
|
A*********u 发帖数: 8976 | 31 作semi-log concentration vs time PK graph的时候
BLQ 怎么处理?
单个病人作图的时候,BLQ算作0的话画不出来,等于从semi-log图里excluded了,我觉
得如果知道Low limit of quantification的话,用那个值来impute BLQ也是一个选择
,有没有这样做的。
作平均图的时候,全是BLQ(pre-dose, 或者48小时)excluded, 其他点算0(否则mean会
和table里的不一样),这样区分处理有问题吗。
惯常的做法是什么?
多谢多谢! |
|
k*******a 发帖数: 772 | 32 interval censored的话可能需要假设BMI是constant, 实现是比较容易的 (先写出
likelihood, 然后用 proc nlp来optimize),我写过sas macro (输入数据就是你样子
的longitudinal 形式的),sas网站也有相关下载。
当然,自简单的solution就是right imputation, 就是把first positive 的时间作为
event time, 如果没有positive, 那么最后那个时间作为censored的时间 |
|
t********m 发帖数: 939 | 33 多谢大侠的回复。你说的interval censored听起来好像挺复杂似的,我估计我搞不定
,另外我这里的bmi等其他的variable不是constant的,所以我还是采取你说的简单的
right imputation方法吧,如果event=0那么时间都是months=84,如果event=1那就
用event发生时的时间。最后我的数据是这样的:
ID MONTHS EVENT SEQ ag1 ag2 ag3 ag4 ag5
1 84 0 8 40 41 42 43 44
4 84 0 8 49 50 . . .
5 84 0 8 42 43 44 45 46
8 12 1 2 52 53 . . .
10 84 0 8 48 . . 51 52
ag6 ag7 ag8 bm1 ... 阅读全帖 |
|
A*******s 发帖数: 3942 | 34 my 2 cents--GAM might be the best choice for nonlinear modeling in business
for a few reasons, say
can use cross validation to govern the trace of the penalization matrix and
avoid overfitting, which is a big concern of nonlinear modeling;
easy to illustrate the marginal effect of each variable, which is not
available in MARS and CART.
Additive assumption make some ad-hoc tasks much easier. say convenient
missing imputation (may not theoretically sound tho), "neutralizing" one
variable, etc... |
|
e****t 发帖数: 766 | 35 Can anyone help download the following paper and send to e*****[email protected]
Biometrics > Vol 66 Issue 4
Multiple Imputation Approaches for the Analysis of Dichotomized Responses in
Longitudinal Studies with Missing Data
Kaifeng Lu1,*, Liqiu Jiang2, Anastasios A. Tsiatis3
Thanks a lot in advance and Baozi in return!! |
|
t*****2 发帖数: 94 | 36 您好,小弟是FRESH GRADATE, 最近在工作,在面试的时候很多时候被问到MISSING
VALUE的问题。我看到你经常在这里解答别人的问题,而且很专业。希望能得到您的答
案。
for example: how to deal with missing value so that it can be used as input
for model? what if 80% of the data are missing?
我就回答了: a)test the pattern of missing value (MCAR/MAR/MNAR)
test some assumptions (eg. normality, because some datasets
are assumed to be normally distributed)
b) Solution: Multiple Imputation... 阅读全帖 |
|
s***1 发帖数: 343 | 37 签了orange county那个,原因是后来他们说也做一些data mining,这个弥补了我之前
的犹豫。同时答应立刻给办H1-B。
这个公司的相关面经,面了4轮:
第一轮电面:recruiter
intern experience
用过的methodology
background的强处在哪里
co-worker对你的评价
缺点
sampling的方法和具体怎么做(这个比较汗,我说什么她明显没明白,到头来她自己笑
了,说是公司让她问的,看我做过就行。。)
第二轮店面: Clinical Operation Director
go over resume
imputation method
study design experience
SAS questions(ODS/Sql/..)
第三轮onsite: Clinical Operation Director/Clinical Scientist/VP/HR
go over projects and intern experience, 问得非常非常细
HR问了好多behavioral question,但是没有特别刁难的
第... 阅读全帖 |
|
s*******d 发帖数: 132 | 38 I have an old dataset, which only the mean and std deviation are known.
I need details of this dataset so that I can use it in regression.
My question is, can I use imputation to generate these data points? Is it a
valid practice in statistcal research?
Thanks in advance. |
|
l******1 发帖数: 292 | 39 it depends on how many missing data you have, if over half of them missing,
it's not good to use imputation. |
|
s*******d 发帖数: 132 | 40 I have the mean and std deviation only. I need 5 data points back.
If imputation is not good, what is the standard practice? Bootstrap?
missing, |
|
w*******9 发帖数: 1433 | 41 I don't clearly understand your question, but as far as I know the multiple
imputation should be based n some conditional distribution ( usually
conditioned on the observed) not the marginal distribution.
a |
|
s*******d 发帖数: 132 | 42 Do not submit stupid answers. Imputation of missing data is common practice
today.
in |
|
k*******a 发帖数: 772 | 43 I do not think you need to impute
mean sd plus if you know n for each group are sufficient statistics and
contain all information you need for unknown regression parameters
just like anova , you only need to know group means and within group
variance to make estimation |
|
T*******I 发帖数: 5138 | 44 For me, you might just want to play a digital game by imputation; but
statistics is to know real world.
A missing data is a unknown fact about a real world. But now you want to
use your math skill to create "facts" to represent the unknown real facts.
practice |
|
A*******s 发帖数: 3942 | 45 missing data analysis is a huge topic and you can find tons of literature
discussing it. Before jumping to any fancy techniques on missing imputation,
i think the very first step is to ask two questions.
The first question is--are the data really missing, meaning there are indeed
true values but we just don't observe them, or, are they actually not
applicable, meaning there is no valid value at all?
If the answer is the latter, then you cannot well define a random variable
on those 'Not Applicab... 阅读全帖 |
|
a*****9 发帖数: 1315 | 46 这个讲的真不错,赞一个
imputation,
indeed
or
indicators, |
|
T*******I 发帖数: 5138 | 47 你所说的大多数情况下的处理方式是很有效的,但有些不属于真正的missing data,例
如已知其first name但性别没有记录。而且,用这种方式处理大样本中的缺省值时,几
乎是困难重重。
我的一个基本观点是,缺省值属于样本中随机发生的现象。在数据建立过程中能尽可能
减少发生的情况下,无需过分担心它们的存在,也无需刻意用所谓的imputation或统计
估计值予以替换,因为一个真正的缺省值就是一个关于客观对象的未知,而一个真实的
未知不能用人为的假设去替换。 |
|
a******n 发帖数: 11246 | 48 我在build一个model的时候,用proc MI来impute missing values,然后得到一个
model equation。
现在我需要用这个equation来predict,那么input里的missing value怎么处理呢?用
同一个变量在build sample里的mean/median?还是有更好的办法?
Thanks! |
|
r********n 发帖数: 6979 | 49 我一般情况下都比较喜欢用logistic regression
因为简单,稳定,效果一般也不错
有的时候重要的不是用什么model
重要的是要找到合适的feature
如果feature不好
用什么fancy的model也没用
工作上也试过好几个不同的数据
效果来说
lr, svm的效果都不错, svm尤其是用non-linear kernel的时候也很容易overfit
decision tree效果一般
random forest效果比lr差不多, 不过计算量稍大, 不容易解释
fuzzy logic效果一般, 也很难解释model
nn效果也不错, 不过optimization比较困难, 而且计算量也要大很多, 完全是black
box
最终发现重要的是找到合适的feature
好的feature和差的feature可能可以差30%
不同的model之间的区别可能是10%以内(after all optimization, e.g. feature
selection, imputation, pruning, CV, bootstrapping)
as
to |
|
w*******9 发帖数: 1433 | 50 IF 1) you have lots of observations 2) dont mind increasing the number of
covariates, and 3) dont have a convincing way of imputation, THEN you can
include into the model a separate indicator variable indicating whether this
variable is missing. |
|