由买买提看人间百态

topics

全部话题 - 话题: predictor
1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)
f******h
发帖数: 46
1
当然也不是这么简单。比如我一股脑把所有上百个predictor都扔进去给sas自己用
stepwise选,那出来的model必定是存在很大的multicollinearity的overfitted的
model。
我说的"随便拿了一个0.9 corr的var"是指和response的corr有0.9。我都去掉
multicollinearity很大的predictor以后,又从去掉的那些里面随便拿了一个回来,然
后stepwise出来的model反而更好(R^2更大,而选择的predictor数量不多或者更少,
同时被选中的predictor之间也没有过大的multicollinearity)。这就让我感觉前面去
除multicollinearity的步骤是存在很大问题的。。。
h*********n
发帖数: 278
2
谢谢回复。我感觉跟penalize by number of predictors没太大关系,因为data很大,
log-likelihood本来就很大哇,那个2k对AIC根本没啥影响,AIC的增减都是以百/千计
的,主要还是对log-likelihood 的影响。试过只include俩完全不可能相关的
predictor,还是有同样的现象。我感觉要么是data的问题,一方面很多zero,另一方面
有一些extremely large number, 用了这个predictor后,有助于ranking, 但是make
point prediction反而更难,所以model fit 更差, AIC increase. Does this make
sense? Actually I'm not so sure.
另一个可能是,我的model assumption不对,应该换一种model? Could this be the
reason? 头大。。。

AIC是一个fitting to the data和model complexity的trade-off。增加一个predictor... 阅读全帖
s**********l
发帖数: 395
3
If I want to build a logistic regression model using tens of predictors, but
one of the predictor has more than 50 % of missing values. How should I
deal with this predictor? Thanks.
c********h
发帖数: 330
4
AIC是一个fitting to the data和model complexity的trade-off。增加一个predictor
永远都会降低RSS,正因为如此才要penalized number of predictors in the model。
大概意思就是,exclude一个变量,AIC下降的话,说明这个变量带来的additional
information 不能compensate 1 more additional parameter.
如果你univariate看这个变量很显著,那可能你的predictors之间有collinearity.
也许说的不对,大家多指点
f********3
发帖数: 20
5
大家好!我最近在用一个类似naive bayes的模型做文本分类,文章中的词是我的
predictor。
如果我先把在很多文章中都出现的词去除(这里利用了整个数据集),再做cross
validation,得到的error会不会是downwards biased的?
补充:我知道依据整个数据集的label选predictor再做cv会使错误偏小。但好像又听到
过别人说利用unsupervised的信息选predictor再cv不会bias,还请大家不吝赐教!
l****z
发帖数: 29846
6
"Parental divorce during childhood was the single strongest social predictor
of early death."
Via The Longevity Project: Surprising Discoveries for Health and Long Life
from the Landmark Eight-Decade Study:
We were surprised to find that although the death of a parent during one’s
childhood was usually difficult, it had no measurable impact on life-span
mortality risk. The children adapted and moved on with their lives.
That was the end of the good news. Although losing one’s parent to divorce
m... 阅读全帖
S******e
发帖数: 393
7
来自主题: Biology版 - predictor什么意思
“no predictors of diagnostic concordance could be identified statistically"
predictor什么意思?
这句话可以这样理解吗?
“从统计学上看,找不到诊断一致性的指标”
or
"could not identify a factor which indicates diagnostic concordance"
r********t
发帖数: 41
8
Is there any result talking about the convergence of linear regression (or
likelihood based) with diverning number of predictors?
Say n observations yi, and xi.
the dimension p of xi diverges as n, say p=p(n).
however the response may depend only on the first 10 predictors.
Many thanks!
n*****s
发帖数: 10232
9
在想对于parsimonious model来说,几乎某一两个predictor是dominator整个model的
,如果这样的predictor出现一些波动,对model的影响就很大。

them
r********e
发帖数: 33
10
来自主题: Statistics版 - How to transform predictor variable?
I was asked - How do you transform your predictor variables?
I said that - if there is long tail to right side, take log form to
make the variable distribution more normal.
Then I was asked - what about the predictor variable with both positive
/ negative values? How do you transform it?
I choked...
q******d
发帖数: 158
11
来自主题: Statistics版 - How to transform predictor variable?
for my experience, i would like to know the business means of the predictor
variables first.
then take log, sqr, etc. depending on the predictor variables.
S******e
发帖数: 393
12
来自主题: Statistics版 - predictor什么意思
“no predictors of diagnostic concordance could be identified statistically"
不是学统计的。 这里predictor 是什么意思,是统计上的词吗?
S*******u
发帖数: 727
13
来自主题: Statistics版 - CONTINUOUS PREDICTOR AND BINARY OUTCOME
我想请教一个问题:
我在做LOGISTIC PREDICTION. 我有CONTINOUS 的 INDEPENDENT PREDICTOR,我要TEST
这个CONTINUOUS PREDICTOR 是否与BINARY OUTCOME 有RELATED.但是我不想把
CONTINUOUS 的VARAIBLE 变成CATEGORICAL,用哪个TEST 测CONTINUOUS VARIABLE 与
BINARY OUTCOME 的相关性?
谢谢了
l*****r
发帖数: 687
14
刘强东妹妹的死说明作为顶级富豪的亲属不能predict所享受的医疗质量。
在天朝自求多福,不代表在美帝不需要自求多福。
但是刘强东李咏之流在美帝可以更好地predict他们可享受的医疗质量。
在美帝财富收入教育程度都是很好的predictor。
至于天朝医疗水平好,还是美帝医疗水平好,只有你这种傻逼五毛才关心。
我老关心的是在哪里我可以predict我自己能享受的医疗质量。
美帝华妇作为整体是享受不到很好的医疗,但是这不妨碍邓文迪确保享受最好的医疗。
l*****r
发帖数: 687
15
天气预报还有不准的,医疗的predictor对于医闹溜肝尖儿基本无效。
d******5
发帖数: 355
16
来自主题: Immigration版 - visa bulletin predictor
有谁用过这个 visa bulletin predictor
http://www.myprioritydate.com/
求科普,和预测准不准,有没有很好的?
r****e
发帖数: 42
17
数值计算中用到的二种算法:"second-order and higher predictor-corrector
integration schemes","strait-forward leap frog algorithm",各位专家能否介绍
一下其思想或何处能查到其介绍,谢谢!
a***y
发帖数: 19743
18
来自主题: Biology版 - predictor什么意思
biomarker概念不如predictor早吧
a***y
发帖数: 19743
19
来自主题: Biology版 - predictor什么意思
不是所有的predictor都是biomarker
r****e
发帖数: 42
20
数值计算中用到的二种算法:"second-order and higher predictor-corrector
integration schemes","strait-forward leap frog algorithm",各位专家能否介绍
一下其思想或何处能查到其介绍,谢谢!
r****e
发帖数: 42
21
数值计算中用到的二种算法:"second-order and higher predictor-corrector
integration schemes","strait-forward leap frog algorithm",各位专家能否介绍
一下其思想或何处能查到其介绍,谢谢!
s*******e
发帖数: 1385
22
是不是可以直接根据BIC做model selection,BIC对增加parameter的penalty最大,增
加Multicollinearity 的predictor对SSE的影响比较小。用BIC是不是会剔除
collinearity的variables。
我也不是很懂,如果觉得没有道理,请不要见怪。
s*r
发帖数: 2757
23
high r^2 may be overfitting
"随便拿了一个0.9 corr的var" is confusing. correlation with the outcome
variable or the other predictor.
i remember the standard procedure should be first forward selection, then
back elimination.
f******h
发帖数: 46
24
确实multicollinearity的predictor对R^2的增长比较小,不过BIC是整个model的一个
参数,在我通过multicollinearity保留或者去除变量的时候,BIC并不能给我针对到变
量的信息,还不如TOL和VIF
f******h
发帖数: 46
25
谢谢,刚刚上面回了一贴,说处理过multicollinearity以后的变量们发现并不是最好
的pool。。。我觉得是我multicollinearity处理方式不对头。我是在每一步去掉VIF最
大的那个变量,我也注意到这样的方式,很容易导致把那些和dependent variable的
correlation最大的predictor都去掉了。。。很ft
我想试试你的方法,在每组corr很大的变量中保留那个univariate R^2最大的。但是这
里也有问题:1)因为变量非常多,这种大corr的组合并不是mutually exclusive的,
就是说组和组的不同变量之间也很难避免一些corr很大,当然,这个可以考虑用
cluster analysis来交给sas解决;2)另一个问题是每组保留一个可靠吗?还是说在
经验上这样的做法是一种惯例?

correlated
f******h
发帖数: 46
26
这是个ad hoc(?)的建模,并不适合controlled experiment :)
你说的很对,如果model里面的predictor都和response很相关,那很可以理解他们也非
常可能存在很大的multicollinearity。不过是不是和response相关度大并不是我最终
追求的,我追求的只是,完成清理multicollinearity以后得到的variable subset,在
接下来的variable selection中给我尽可能接近最优解的model
(也许像前面人说的,不能通过R^2来判断这个model优劣,而需要切实validate
model的predictive performance)
而我现在怀疑的是,我在处理multicolinearity的过程中,使用的方法是不是得当。或
者说有没有比较被接受认同的常用筛选方法。
希望我表达清楚了疑问点。。。

if
you
bic
whether
f*****a
发帖数: 693
27
来自主题: Statistics版 - a question about ordinal predictor
In risk analysis, if the predictors are ordinal (such as the "scores").
Will it be treated as categorical or quatitatively?
Thanks a lot.
F****r
发帖数: 151
28
来自主题: Statistics版 - How to transform predictor variable?
Both negative and positive, you can do the location transformation first to
make all positive or negative....however, why the predictor variable
transformation was needed at the first place?
A*******s
发帖数: 3942
29
来自主题: Statistics版 - How to transform predictor variable?
i am having the same question too. can any NIUREN recommend some reviews
about predictors' transformation?
n******r
发帖数: 1247
30
来自主题: Statistics版 - How to transform predictor variable?
FICO is in no way more explainable than ln(FICO)
Do you know how FICO is calculated and scaled? How do you know FICO is in a
linear relationship with your target variable?
Fair Issac can one day do a sqrt transformation of FICO and still give it
out as FICO. You still use it the same way or all of a sudden FICO^2 makes
more sense to your boss?
if FICO is not in a good linear relationship with the target, your
prediction can be very off when you look at the performance by different
FICO bands, i.... 阅读全帖
n******r
发帖数: 1247
31
来自主题: Statistics版 - How to transform predictor variable?
Thanks man. I do need to chill out. If he hadn't had those "业界" words, I
wouldn't bother to argue with him. What's he said about idea of
transformation was very wrong and very misleading when he started it with "
业界". That's what pissed me off.
I have no problem with a boss saying what's the business meaning of this 3
way interaction or 2 way interaction. But the fact that a boss would
disapprove a monotone transformation of a predictor in the name of business
sense just shows how miserable his ... 阅读全帖
y*****n
发帖数: 5016
32
来自主题: Statistics版 - How to transform predictor variable?
Sigh….就算你说的全对,我说的全错,怎么样?你既然有时间冲着我bullshit来
bullshit去的,为什么不全面地简明易懂地回答一下楼主同学的问题,向大家详细科普
一下你是怎么对上百个甚至几百个predictor variables做全面的transformation并进
行有效率的挑选的?我相信如果你这样做了,版上的同学们会给你包子表示感谢的。我
会洗耳恭听,保证不插嘴。

industry
some
s**********l
发帖数: 395
33
You means creating a dummy variable A, if the predictor's value is missing
then set A=0, else set A=1?
If the input variable which has many missing values is continuous variable,
is it still OK to use dummy variable?
Thanks.
l******e
发帖数: 895
34
我有81个obs, 其中只有7个是cases, 也就是1, 其余都是0。
然后有20个predictor variable, 想做model selection, 再predict y, 是不是结果会
不太好?
我试了一下,predict出来全是0.
请问有别的machine learning做classification的方法对这种情况好用吗?
s*r
发帖数: 2757
35
agresti的书上说每10个observation on each level用一个predictor
h*********n
发帖数: 278
36
至少以前我碰到的是这样的。最近一个model,把一个predictor exclude后,AIC反而降
了, 为什么呢?从univariate来看,还是有明显pattern的。而且exclude的话gain也
降了好多。貌似把target cap在一个比较low的level,就恢复正常了,但是实际上不可
能cap这么低的。怎么办呢?
w**********n
发帖数: 36
37
来自主题: Statistics版 - CONTINUOUS PREDICTOR AND BINARY OUTCOME
难道不是用这个连续变量作为PREDICTOR, 然后TEST COEFFICIENT是否SIGNIFICANT?
y**3
发帖数: 267
38
来自主题: Statistics版 - predictors
我一直有个疑问,看了同事做的一个MODEL.To predict a binary response
A=(BxC)/D, A is time-dependent continuous variable, then classing A as a
binary response according range. Then he put like B , C , # of D, average A
in the model as predictors. Obviously they are correlated. I don't think
those variables should be in the model, especially if prediction into future.
大家说说看法吧
d*********d
发帖数: 239
39
做simulation,有好几个predictors.其中发现x2 and x4是highly correlated. 我们
一般怎么做呢?
T*****e
发帖数: 315
40
pca? center the predictors?
r******g
发帖数: 286
41
根据correlation matrix,把Correlation coefficients大的相应predictor挑出来,
然后保留其中一个。除了这个方法外,请教各位高手:还有没有其他方法来处理这个问
题?特别是实际工作中如何处理这个问题?
o**2
发帖数: 168
42
来自主题: Java版 - 工作中遇到的并行处理问题
再贴点运行结果,供没空动手的同学参考。
Predictor#0 is starting - main
Predictor#0 is ending
Predictor#0 predict(input0) is starting - main
Predictor#0 predict(input0) is ending
Predictor#0 predict(input1) is starting - main
Predictor#0 predict(input1) is ending
Predictor#0 predict(input2) is starting - main
Predictor#0 predict(input2) is ending
Predictor#0 predict(input3) is starting - main
Predictor#0 predict(input3) is ending
Predictor#0 predict(input4) is starting - main
Predictor#0 predict(input4) is ending
Pr... 阅读全帖
U*******1
发帖数: 1565
43
来自主题: NCAA版 - One computer rating comes out
HOME ADVANTAGE= 3.30 RATING W L SCHEDL(RANK) VS top 10 | VS
top 30 | ELO_CHESS | PREDICTOR
1 LSU A = 102.97 13 0 76.01( 18) 3 0 |
4 0 | 108.06 1 | 99.75 1
2 Alabama A = 98.99 11 1 74.27( 23) 1 1 |
2 1 | 99.49 2 | 98.03 2
3 Oklahoma State A = 97.31 11 1 80.28( 6) 2 0 |
6 1 | 97.81 3 | 96.36 3
4 Oklahoma A = 92.87 9 3 ... 阅读全帖
p********a
发帖数: 5352
44
来自主题: Statistics版 - [合集] 问个基本的建MODEL问题
☆─────────────────────────────────────☆
zhongdianshi (brb) 于 (Mon Aug 29 09:50:26 2011, 美东) 提到:
OUTCOME: BMI
PREDICTOR: QUESTION1, QUESTION2, QUESTION5, QUESTION6...
所有的PREDICTORS是ORDIANL VARIABLE.
我想分别TEST OUTCOME和每一个PREDICTOR的CORRELATION.
我用了2个方法:
1.
PROC CORR SPEARMAN;
VAR BMI QUESTION1n QUESTION2n...;
RUN;
生成一个CORRELATION TABLE.
2. ANOVA
分别把每个PREDICTOR和BMI放到MODEL里,这一步,我不是很确定.
proc glm data = DATA;
class QUESTION1;
model BMI = QUESTION1;
meansQUESTION1;
run;
quit;
最终,是要建个MIXED MOD... 阅读全帖
o**2
发帖数: 168
45
来自主题: Java版 - 工作中遇到的并行处理问题
Give each Predictor object its own thread, and apply producer/consumer
pattern twice: one is from user thread to predictor thread for dropping off
prediction input, and the other one is form predictor thread to user thread
for picking up computed prediction.
public class Predictor implements Runnable {
private String input, prediction;
private boolean stop;
public Predictor () {
new Thread (this).start ();
}
public synchronized boolean isIdle () {
return input... 阅读全帖
T***B
发帖数: 137
46
来自主题: Java版 - 工作中遇到的并行处理问题
照着mectite,goodbug二位的思路写了一下,代码如下. 试着跑了一下,运行结果和预
期吻合。我有一个问题:我在PredictRequest.call()里面把current thread cast成
PredictorThread从而拿到predictor object. 还有更好的办法把predictor (inside
the thread) 和callable联系起来吗?
Predictor.java
public class Predictor {
private String name;

public Predictor(String name) {
// heavy lifting stuff.
this.name = name;
System.out.println("Created predictor " + name);
}
public synchronized String predict(String input) throws
InterruptedExcept... 阅读全帖
o**2
发帖数: 168
47
来自主题: Java版 - 工作中遇到的并行处理问题
把FMP的版本又优化了一下,很容易就把constructor()和predict()给放到一个active
object里去了,不但constructor()的执行是在worker thread里,而且是on demand的。
在使用原Predictor class的前提下,只增加了PredictorActiveObject和
PredictService这两个classes。
// FMP guarantees every instance a single-threaded env
public class PredictorActiveObject {
private static final AtomicInteger created = new AtomicInteger ();
private Predictor predictor;
// single-threaded
public void init () {
if (predictor == null) {
predictor = new Pre... 阅读全帖
T***B
发帖数: 137
48
来自主题: Java版 - 工作中遇到的并行处理问题
smectite, you raised an very important point.
If we start with the requirements, what I need is a group of predictors that
can independently make predictions within a JVM. Some characteristics of
the system:
- Initializing a predictor is time consuming.
- A predictor, once initialized, holds non-trivial amount of memory.
- Predict call is CPU intensive. predict() method is not thread safe.
- A client request triggers a batch of predict calls. Batch size can vary.
From design perspective, what I'... 阅读全帖
o**2
发帖数: 168
49
来自主题: Java版 - 工作中遇到的并行处理问题
最后这个Main class算是user program里设置active objects的地方。
其中的"builder"是用来执行费时的constructor,也是一个cluster,用法和上面介绍
的"predictor"是一样。
import java.util.ArrayList;
import java.util.List;
import com.fastmessenger.impl.Messenger;
import com.fastmessenger.model.IMessenger;
import com.fastmessenger.model.IReturn;
public class Main {
public static void main (String[] args) {
List inputs = new ArrayList ();
for (int i = 0; i < 10; i++) {
inputs.add ("input" + i);
... 阅读全帖
o**2
发帖数: 168
50
来自主题: Java版 - 工作中遇到的并行处理问题
如果楼主可以把contructor()里的logic搬到一个init() method里的话,那就没有必要
加新class了,改造原先的Predictor和PredictService就够用的了。
从下面列出原问题的FMP终结版可以看出,FMP和thread,thread pool,
ExecutorService等并发编程技术相比,就象C语言和汇编语言相比,那是全方面的胜出。
public class Predictor {
private boolean inited = false;
// instance-wide single-threaded
public void init () {
if (inited) {
return;
} else {
inited = true;
}
// heavy lifting stuff from original constructor
}
// instance-wide singl... 阅读全帖
1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)