第10页 - 关于regression的讨论汇总 - 话题女王

全部话题 - 话题: regression

c******g
发帖数: 63

来自主题: Statistics版 - 请教linear regression中的subset selection算法

小弟是新手。想请教一下关于linear regression中的subset selection，比如用leaps
and bound选best subset，还有greedy的forward step-wise selection和backward
step-wise selection这些算法，在哪本书或参考资料里有讲得详细一点的？（就是具
体的算法流程是怎样的，最好有点example）
像The Elements of Statistical Learning这本书对这些就是泛泛而讲，比如forward
selection是从0个变量开始，一个一个加－－idea当然是这样，但具体地怎么操作呢？
比如什么是挑选哪个新变量加入的metric呢（RSS？test error还是什么Mallow's Cp）
？什么是terminal condition表示不加了呢……这些都没讲……
非常感谢！

A*******s
发帖数: 3942

来自主题: Statistics版 - 请教linear regression中的subset selection算法

Subset Selection in Regression这本书讲了不少细节，不过我觉得内容有点老。网上
有电子书。
简单来说GLM的automatic selection都是基于likelihood的，一般来说是likelihood
ratio test with pre-specified significance level，当然也可以用AIC，BIC，
cross validation之类的。

leaps
forward

a***e
发帖数: 1627

来自主题: Statistics版 - 10个包子，请教我做一道regression的题。。

看了很久的书还是不明白。。不该刚上来就选regression
现在真是抓瞎。。一个x的还能明白，两个以上就晕了
谢谢各位了

t**c
发帖数: 539

来自主题: Statistics版 - 10个包子，请教我做一道regression的题。。

我觉得看题目的意思，beta1,beta2可以是向量的。
我去翻了翻我的linear regression的教科书，检验整个向量 = 0，好像没什么区别啊
。只是有时候为了直接用软件的输出，可能会写成不同的形式吧？
如果理解错了，请指正。

t**c
发帖数: 539

来自主题: Statistics版 - 10个包子，请教我做一道regression的题。。

恩，和你讨论讨论，算是又复习了一下regression。

a***e
发帖数: 1627

来自主题: Statistics版 - 15个包子求大牛指导做一道regression 题

实在是不会做regression 的题.求指点
谢谢啦

n******t
发帖数: 189

来自主题: Statistics版 - SVM和logistic regression 的比较

同样是进行预测，哪个好点？
据说若数据分类边界不是linear的话，SVM好，否则logistic regression好？
还有什么问题？
谢谢

n******t
发帖数: 189

来自主题: Statistics版 - SVM和logistic regression 的比较

那若只有一个logistic regression的model,确不知道它本身的任何信息，我是指这模
型基于什么数据，怎样的数据产生的，甚至用没用penalty也不清楚的话，SVM是不是也
许会可以beat这个呢。

n******t
发帖数: 189

来自主题: Statistics版 - SVM和logistic regression 的比较

呵呵，用最近的数据带入这个logistc regression model后，不怎么准。。。

n**********e
发帖数: 18

来自主题: Statistics版 - Wilcoxon rank sum test与logistic regression结果不同？

测一个continuous variable对event发生率的影响，
用了Wilcoxon rank sum test结果给出p-value 0.02显示有影响
再用logistic regression结果给出p-value 0.42
这两结果也相差太远了吧！到底哪个test更准呢？
谢谢牛人指导！

s*r
发帖数: 2757

来自主题: Statistics版 - Wilcoxon rank sum test与logistic regression结果不同？

Wilcoxon rank sum test shows the distribution of the x variable is different
by a location shift in y=1 group and y=0 group
logistic regression shows there is no significant linear increasing of logit
(pr(y)) as each unit increase of x

z******n
发帖数: 397

来自主题: Statistics版 - Q's when fitting exact logistic regression...

呃，我觉得你说的不大对。
你给的页面里面提到：
What are the techniques for dealing with complete separation or quasi-
complete separation?
... ...
Exact method is a good strategy when the data set is small and the model is
not very large. Below is a sample code in SAS.
proc logistic data = t2 descending;
model y = x1 x2;
exact x1 / estimate=both;
run;
这表明exact logistic regression可以用来解决data complete separation的问题。
但complete separation并不是degenerate
按我的理解，degenerate distribution在exact test里面是指所关心的参数a的充分统
计量T的条件分布是退化的... 阅读全帖

e****t
发帖数: 766

来自主题: Statistics版 - Q's when fitting exact logistic regression...

Thank you so much for you guys discussion!!!!
very helpful..
I will go back and try firth logistic regression to see if it works.
proc logistic data = t2 descending;
model y = x1 x2 /firth;
run;

p********r
发帖数: 1465

来自主题: Statistics版 - 数据分层后在不同level做的correlation/regression一样吗？

请假大家：
例如数据是这样的，
部门员工 X Y
M 1 3.4 5
M 2 4.5 8
N 3 2.3 9
...
按部门归类的话，
部门 X Y
M 7.9 13
N 2.3 9
...
如果要对X和Y做correlation/regression等分析，在员工层面做和在部门层面做得到的
结果会一样？还是近似？还是完全不同呢？
还是说要取决于数据的特点？
谢谢

t**c
发帖数: 539

来自主题: Statistics版 - PCA and linear regression

请教PCA和regression之间是什么关系啊？

s**********y
发帖数: 38

来自主题: Statistics版 - goodness-of-fit test for logistic regression 大于.1怎么办？

我fit logistic regression with binary dependent variable, deviance,pearson,
hosmer and lemeshow 都显著，那我该用什么model 呀？
在网上搜了很多例子，那些goodness-of-fit test都是不显著。
谢谢！

b******s
发帖数: 325

来自主题: Statistics版 - how to interpret these regression coefficients?

新手急需要帮助解释下REGRESSION COEFFICIENT.谢谢帮忙先！！！
Model:
Dependent var: Delta Y (which is the outcome change between baseline and a
follow-up measurement point)
Indepdent vars: baseline score category 1, baseline score category 2 (NOTE:
category 3 is the omitted category), plus a bunch of "state" dummies (VA, IN
, MD with MS as the ommited category) and a "treatment model" dummy (where 1
= diversion; 0, transition).
QUESTION 1:
So the constant is interpreted as "the average outcome change for population... 阅读全帖

f*****a
发帖数: 693

来自主题: Statistics版 - 求 "Classification and Regression Trees" by Breiman

library.nu 上不去了. 5 个包子求 "Classification and Regression Trees" by
Breiman et al., 1984.
谢谢!

q****k
发帖数: 1023

来自主题: Statistics版 - Similar "freq count" statement in SPSS logistic regression

I have no problem to use SAS Proc Logistic for an input data with aggregate
"count" variable.
But in SPSS, for the same input data with "count" variable, how to get the
similar "freq count" statement for SPSS Logistic Regression?
Thanks!
Please refer to
http://support.sas.com/rnd/app/da/cat/samples/chapter8.html
data coronary;
input sex ecg ca count @@;
datalines;
0 0 0 11 0 0 1 4
0 1 0 10 0 1 1 8
1 0 0 9 1 0 1 9
1 1 0 6 1 1 1 21
;
run;
proc logistic des... 阅读全帖

r********n
发帖数: 6979

来自主题: Statistics版 - logistic regression结果释疑，解读

52个感觉是多了一点
我不用sas
所以不知道这个logistic regression是怎么得到这些系数的
你可以试试用一些Bayesian methods
加上一些df的penalty
比方LASSO之类的
应该得到的df要小一些

r********n
发帖数: 6979

来自主题: Statistics版 - logistic regression结果释疑，解读

backward selection我当然知道它是怎么做的
我是不大明白sas是怎么estimate regression coefficients
估计是用EM algorithm得到一个point estimate
如果用Bayesian的方法
可以加不同的prior
如果你想要少一点的predictor
那就可以加一些强一点的prior
这样大部分的系数都是接近与0
你试试LASSO
sas应该有LASSO的函数

a***d
发帖数: 336

来自主题: Statistics版 - logistic regression结果释疑，解读

it is maximum likelihood estimation for logistic regression.

S*x
发帖数: 705

来自主题: Statistics版 - logistic regression结果释疑，解读

52个肯定多，有没有做proc corr来看variable之间的相关性? 有些时候很多variable是
成对成对出现的
你按照wald score排序来做一个cumulative wald score，肯定会发现最后
那些var都是用不到的
另外你这个图只显示model本身，logistic regression的somer's D呢? c score呢?
如果以上的数据都不错，你还得需要看validation来证明model是可用的

s******a
发帖数: 184

来自主题: Statistics版 - 一个关于regression的问题

在学习linear regression 的过程中见到这样一个使用ANOVA table 的例子，
在这个模型中，Y 代表response variable, 有两个自变量，X1 和X2
例子中说，根据以上的ANOVA table, 可以判断出以下几点
1）在考虑X2对X1和Y的影响以后，X1 也和Y有很强的线性相关性
2）假如不考虑X2的影响, X1和Y的线性相关性就不那么明显了。
3）不论考不考虑X1的影响，X2 和Y都有很强的线性相关性，
ANOVA table中的哪些信息可以帮助我得到上面的结论呢，

a****t
发帖数: 1007

来自主题: Statistics版 - 一个关于regression的问题

注意不同regression的p value，就可以说明那几点。

f*********y
发帖数: 376

来自主题: Statistics版 - logistic regression issue

I have similar issues as mentioned in http://www.mitbbs.com/article/Statistics/31314451_0.html
From this post and reply, there are several ways to select variabel for
logistic regression
1. Use correlate to find highly correlated variables and delete some of them
if possible. But by what criteria?
2. Use backward selection method. Does SAS have the routine or I need
program the process by myself?
3. Use best subset method. Does SAS have the routine or I need program the
process by myself?
4. If ... 阅读全帖

d******r
发帖数: 193

来自主题: Statistics版 - 什么软件提供local weighted non parametric regression的包啊？

包子已发，非常感谢。不过大略看了下只有提供polynomial regression.有没有提供
logistic和probit的呢？

j*****n
发帖数: 1545

来自主题: Statistics版 - 请问regression现在比较流行的算法是什么 (转载)

【以下文字转载自 CS 讨论区】
发信人: jetchen (飞机), 信区: CS
标题: 请问regression现在比较流行的算法是什么
发信站: BBS 未名空间站 (Thu Apr 12 18:58:29 2012, 美东)
最近没关注这个领域，只知道SVR 和 kernel logistic

v******2
发帖数: 9

来自主题: Statistics版 - Regression Question

Can anyone help me answer this question?
"Researchers studied the length of time spent on individual home visits by
public health nurses.They wished to answer the following question: does the
mean length of home visit differ among different age groups of nurses?"
1. What is the response variable of interest, and what data type is it?(
qualitattive or quantitative)
2. Can we analyze the researchers' data by fitting a OLS simple linear
regression model? If not , how can you analyze the data?
3X...

h***i
发帖数: 3844

来自主题: Statistics版 - 这段R logistic regression code有没有问题？

你try一遍不就得了，这有啥好问的。

再 run
regression

f*******6
发帖数: 103

来自主题: Statistics版 - 这段R logistic regression code有没有问题？

when I run the logistic regression, I didn't remove the dependent variable.
It seems correct

frame
prediction

J*****n
发帖数: 4859

来自主题: Statistics版 - A question about regression

It makes me confused.
I have two series, x and y.
When I run y = ax + b, I got:
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-38161 -5115 -19 4550 40688
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.300e+03 1.221e+03 1.065 0.29
x 1.058e+00 4.398e-02 24.057 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 11210 on 84 degrees of freedom
Multiple R... 阅读全帖

J*****n
发帖数: 4859

来自主题: Statistics版 - A question about regression

according
x is predict result of some model to predict y.
To my mind, y should be y = 1 * x + 0 + some error item. (*)
and also x = 1 * y + 0 + some error. (**)
My above regresions showed that
y = 1.05 * x + some error item. and this 1.05's sd is 0.04. So I see (*)
hold.
However, for the second regression, it showed that :
x = 0.85 * y + 0 + some error item, and this 0.85's sd is 0.03, thus (**)
failed.
I am confused about this fact.
Thank you for you reply.

z**********i
发帖数: 12276

来自主题: Statistics版 - regression discontinuity design

现在,有个课题需要EVALUATE A PROGRAM.
有人说regression discontinuity design或许行,以前,没听说过.
不知道有人用过这个吗?
多谢!!

R******d
发帖数: 1436

来自主题: Statistics版 - proc mixed multivariate regression的参数

请看网上的列子：
http://courses.ttu.edu/isqs5349-westfall/images/5349/Multivaria
proc mixed method=reml; /* Performs Multivariate Regression Analysis */
/* Identical to separate OLS models */
/* Method=REML is "restricted" ML, adjusted for df
*/
class cartype month;
model sales = cartype cartype*ppggas cartype*intinv/noint s ddfm=satterth;
repeated /subject=month type=un r=1 rcorr=1;
run;
如果写成：
class cartype;
repeated cartype/subject=month typ... 阅读全帖

A*******s
发帖数: 3942

来自主题: Statistics版 - [Q]degrees of freedom in constrained regression

If we fit a regression model with some boundary constrains on some
parameters, is it correct that the model's df is the number of parameters
which do not hit the boundary constraints? wanna confirm this statement.
thanks.

s*r
发帖数: 2757

来自主题: Statistics版 - [Q]degrees of freedom in constrained regression

我觉得可以借鉴penalized regression里面算df的方法

n*****t
发帖数: 18

来自主题: Statistics版 - 锟斤拷蹋锟斤拷锟矫达拷馨锟絃ogistic regression锟斤拷OR转锟斤拷锟斤拷probablity

锟斤拷锟斤拷锟絣ogistic regression锟斤拷锟斤拷coefficient锟斤拷锟斤拷odds
ratio锟斤拷锟斤拷为probability
X 锟斤拷 continuous variable or binary variable
Y 锟斤拷 binary variable
锟斤拷锟斤拷卮锟斤拷锟斤拷锟斤拷锟角ｏ拷
one unit change or value change from 0 to 1 of X ----> what does it mean in
terms of probability of choosing Y=1?
锟斤拷要转锟斤拷odds ratio锟金？伙拷锟斤拷直锟接撅拷锟斤拷锟斤拷OR? 锟斤拷锟
斤拷一些锟斤拷锟阶ｏ拷锟斤拷锟斤拷锟斤拷直锟斤拷锟斤拷OR...
锟斤拷谢锟剿ｏ拷

n*****t
发帖数: 18

来自主题: Statistics版 - 锟斤拷蹋锟斤拷锟矫达拷馨锟絃ogistic regression锟斤拷OR转锟斤拷锟斤拷probablity

n*****t
发帖数: 18

来自主题: Statistics版 - 请教：怎么能把Logistic regression的OR转化成probablity

我想把logistic regression出的coefficient或者odds ratio解释为probability
X ： continuous variable or binary variable
Y ： binary variable
我想回答的问题是：
one unit change or value change from 0 to 1 of X ----> what does it mean in
terms of probability of choosing Y=1?
需要转化odds ratio吗？还是直接就能用OR? 看了一些文献，好像不能直接用OR...
多谢了！

I*******o
发帖数: 109

来自主题: Statistics版 - 求教材, A Second Course in Statistics: Regression Analysis 7th

请问谁有这两本书的PDF:
A Second Course in Statistics:Regression Analysis 7th
Mendenhall & Sinchich
Statistical Analysis of Designed Experiments: Theory and Applications
Tamhane， 2009
我的邮箱:w******[email protected]
万分感谢！

a*q
发帖数: 1256

来自主题: Statistics版 - 包子问一个SAS regression蠢问题

假设想做 all possible selection regression,假设有三个variable X1 X2 X3, 那y=
x1 x2 x3/ selection=?
又假设想让X2第一个进入model并保留，其余两个继续all possible selection，该用
什么命令？
★ 发自iPhone App: ChineseWeb 7.3

l*****e
发帖数: 701

来自主题: Statistics版 - 求教个regression问题

现要做个regression
y1 = a0 + a1x1 + a2x2
y2 = a3 + a1x2 + a3x1
主要问题是两个equation中，a1是common factor，改怎么操作...谢谢

c*******o
发帖数: 8869

来自主题: Statistics版 - 紧急求助，logistic regression

用logistic regression 得到了predicted probability(for case=1), 结果在
predicted likelihood 很高 (0.8-1) 的区间和很低的区间（0-0.1), case
enrichment 很高，而在中间区（0.5左右）， case 的比例很底，为什么predicted
likelihood 会有这种curvature的情况，如何处理？
跪谢了。

p*****y
发帖数: 34

来自主题: Statistics版 - 紧急求助，logistic regression

I guess regular logistic regression can't deal with sparse case

reason?

r********n
发帖数: 6979

来自主题: Statistics版 - 如何在一个regression model里面同时处理continuous和categorical变量

我知道可以用decision tree
这个好像对变量没有硬性的要求
不过如果在别的model里面
有没有什么方法可以让两种变量并存
比方在linear regression model里面怎么办？
而且categorical变量里面
有些时候变量只是代表不同而已
之间没有“距离”的概念
比方说，一个变量是颜色，红色，绿色，黄色
好像不能简单的变成0，1，2
这种情况应该怎么办？

h***i
发帖数: 3844

来自主题: Statistics版 - 如何在一个regression model里面同时处理continuous和categorical变量

you need to find a regression book, I think, this is a basic thing

w****n
发帖数: 266

来自主题: Statistics版 - 请问regression 分析

现在有组数据， 200多个变量，想做回归分析，请教attribute selection 和
regression algorithm，有什么参考书可以看看，google了一阵没有头绪。

l******n
发帖数: 9344

来自主题: Statistics版 - regression prediction问题

regression training data set里面，有个categorical variable只有３个level
需要prediction的data里面有一个data，这个categorical variable的值不再这３个
level里面，怎么做prediction?
谢谢

t********m
发帖数: 939

来自主题: Statistics版 - 该用cox regression model还是gee model？

多谢你的回复。我们只关心病人第一次得病，病人第一次得病以后的数据都不会被用到
model中来。
我想bmi随时间的变化也是我们所关心的。我不是很明白你说的按照传统survival
analysis来做，就只能考虑发病那个月的bmi或者baseline的bmi，因为发病的病人可能
并不只是有两个bmi，比如说，如果有病人在36个月时发病，那么他的有效数据如下：
Obs ID MONTHS SEQ EVENT DAYS AGE BMI
80 12 0 1 0 0.001 45 26.10000038
81 12 12 2 0 372 46 31.60000038
82 12 24 3 0 759 47 31.5
83 12 36 4 1 1179 48 31.89999962
也就是说36个月以后的数据我们不关心，可是之前的包括36个月时候的数据都是我们所
关心的。而且如果病人... 阅读全帖

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天