R里面regression 变量选择的package? - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - R里面regression 变量选择的package?

相关主题
● 请教一下ridge regression。	● 有80个候选Predictors,怎么从中选<10个
● 问个关于lasso的问题	● 用LASSO选变量后重新fit regression有什么弊端？
● 请问能用glmnet实现weighted least square regularization吗	● 请教simultaneous equation system
● model的predictors之间有multi-colinearity怎么办？	● 多大的data算是large data set？
● 关于使用adaptive lasso中weight的问题	● 请问大家现在做adaptive LASSO都用R的什么package?
● 怎么解决共线性问题	● 请教一个lasso的问题，如何选定最后的model
● 请问：想fit gamma 并同时用lasso的方法做variable selection	● 新手请教logistic regression
● Question about LASSO in R	● 很惭愧的问一个简单的regression algebra.

相关话题的讨论汇总
话题: lasso话题: regression话题: 变量话题: package

进入Statistics版参与讨论

(共1页)

q**j
发帖数: 10612

终于到了这一步了。请大家推荐一下各种regression variable selection tools。比如
正常regression里面哪个比较好？
另外在ridge, lasso，LAR下面哪个好。还有什么glmnet的？我全部尝试一边，可以汇报
实际效果。
另外问一下，如果用lasso来选择变量，但是用Ordinary least square 估计系数和cov
ariance matrix，这样做合理吗？我要estimate system of equations，不知道lasso这
样的有现成package给用么？普通regression有package systemfit干这个。多谢了。

A*******s
发帖数: 3942

exactly the question i wanna ask. How can we make inferences about the
estimates from regularized method?
The second one is also very interesting to me--how to use regularization for
system of equations?
waiting for big bulls...

比如
汇报
cov
lasso这

【在 q**j 的大作中提到】

: 终于到了这一步了。请大家推荐一下各种regression variable selection tools。比如
: 正常regression里面哪个比较好？
: 另外在ridge, lasso，LAR下面哪个好。还有什么glmnet的？我全部尝试一边，可以汇报
: 实际效果。
: 另外问一下，如果用lasso来选择变量，但是用Ordinary least square 估计系数和cov
: ariance matrix，这样做合理吗？我要estimate system of equations，不知道lasso这
: 样的有现成package给用么？普通regression有package systemfit干这个。多谢了。

d******e
发帖数: 7844

先回答第一个问题：
ridge根本不能做变量选择。
LARS只不过是Lasso的一个Greedy Solution的一种，已经过时了。
直接上glmnet就行了。
再回答第二个问题：
可以用Lasso来做变量选择，再用OLS来重新做estimation，这个就是现在流行的two-
stage model selection and estimation。
你说的system of equations是什么东东？是想找Ax=b的最小L1 norm解？这个就是
compressive sensing啊。直接把regularization设置的非常小，然后threshold一下得
到（n-1）个variable就行了。

比如
汇报
cov
lasso这

【在 q**j 的大作中提到】

d******e
发帖数: 7844

L1 Regularization的主要目标不是做Estimation的，所以不建议在Lasso上做
inference。

for

【在 A*******s 的大作中提到】

: exactly the question i wanna ask. How can we make inferences about the
: estimates from regularized method?
: The second one is also very interesting to me--how to use regularization for
: system of equations?
: waiting for big bulls...
:
: 比如
: 汇报
: cov
: lasso这

A*******s
发帖数: 3942

good to know thanks!!
i think the system of equation lz mentioned is like
Y1=X1*beta1+e1
Y2=X2*beta2+e2
f(Y1, Y2)=0
g(e1, e2)=0
basically there are additional equations to connect two or more regression
models. That's my understanding.

【在 d******e 的大作中提到】

: 先回答第一个问题：
: ridge根本不能做变量选择。
: LARS只不过是Lasso的一个Greedy Solution的一种，已经过时了。
: 直接上glmnet就行了。
: 再回答第二个问题：
: 可以用Lasso来做变量选择，再用OLS来重新做estimation，这个就是现在流行的two-
: stage model selection and estimation。
: 你说的system of equations是什么东东？是想找Ax=b的最小L1 norm解？这个就是
: compressive sensing啊。直接把regularization设置的非常小，然后threshold一下得
: 到（n-1）个variable就行了。

d******e
发帖数: 7844

这种没玩过,不过一样可以formulate成一个L1 regularization问题。但是可能要同时
考虑两个regression model之间的权重

【在 A*******s 的大作中提到】

: good to know thanks!!
: i think the system of equation lz mentioned is like
: Y1=X1*beta1+e1
: Y2=X2*beta2+e2
: f(Y1, Y2)=0
: g(e1, e2)=0
: basically there are additional equations to connect two or more regression
: models. That's my understanding.

l******n
发帖数: 9344

用lagrange multiplier,变成OLS的问题了

【在 A*******s 的大作中提到】

q**j
发帖数: 10612

right on, thanks.

【在 A*******s 的大作中提到】

q**j
发帖数: 10612

great. thanks.

【在 d******e 的大作中提到】

l*********s
发帖数: 5409

mark

相关主题
● 怎么解决共线性问题	● 有80个候选Predictors,怎么从中选<10个
● 请问：想fit gamma 并同时用lasso的方法做variable selection	● 用LASSO选变量后重新fit regression有什么弊端？
● Question about LASSO in R	● 请教simultaneous equation system
进入Statistics版参与讨论

n*****n
发帖数: 3123

给个two-stage model selection and estimation的reference

【在 d******e 的大作中提到】

d******e
发帖数: 7844

http://arxiv.org/pdf/0704.1139
http://jmlr.csail.mit.edu/papers/volume11/zhang10a/zhang10a.pdf
第二个虽然是non-convex penalty，但本质上还是一个two-stage的prcedure

【在 n*****n 的大作中提到】

: 给个two-stage model selection and estimation的reference

n*****n
发帖数: 3123

xie le

【在 d******e 的大作中提到】

: http://arxiv.org/pdf/0704.1139
: http://jmlr.csail.mit.edu/papers/volume11/zhang10a/zhang10a.pdf
: 第二个虽然是non-convex penalty，但本质上还是一个two-stage的prcedure

F****n
发帖数: 3271

I am not big bull, but I know the inference of penalized methods have to
rely on Bayesian-style thinking, i.e. you "know" the regularization you
use fit your prior knowledge.

for

【在 A*******s 的大作中提到】

A*******s
发帖数: 3942

thanks. good to know

【在 F****n 的大作中提到】

: I am not big bull, but I know the inference of penalized methods have to
: rely on Bayesian-style thinking, i.e. you "know" the regularization you
: use fit your prior knowledge.
:
: for

d******e
发帖数: 7844

... ...你好歹应该先知道Lasso和Ridge是什么吧？

【在 q**j 的大作中提到】

: great. thanks.

q**j
发帖数: 10612

不好意思。好像ridge的作用是给每个变量一点点系数，所以哪个也去除不了。自然不
能用来变量选择了。对比？我刚看的。

【在 d******e 的大作中提到】

: ... ...你好歹应该先知道Lasso和Ridge是什么吧？

q**j
发帖数: 10612

打听一下，glmnet default给前面加一行constant。但是我的X matrix里面已经有了这
个常数了，能否改变glmnet？还是我自己要改变了？
还有
set.seed(1010)
n=1000;p=100
nzc=trunc(p/10)
x=matrix(rnorm(n*p),n,p)
beta=rnorm(nzc)
fx= x[,seq(nzc)] %*% beta
eps=rnorm(n)*5
y=drop(fx+eps)
px=exp(fx)
px=px/(1+px)
ly=rbinom(n=length(px),prob=px,size=1)
set.seed(1011)
cvob1=cv.glmnet(x,y)
以后
coef(cvob1)的结果和
coef(cvob1,s=cvob1$lambda.min)不一样。
我想第二个应该是我们最后想要的东西，但是第一个是什么？
最后，cv.glmnet里面如何能够取出来那个balance L1和L2 penalty的常数呢？

... ...你好歹应该先知道Lasso和Ridge是什么吧？

【在 d******e 的大作中提到】

: ... ...你好歹应该先知道Lasso和Ridge是什么吧？

(共1页)

进入Statistics版参与讨论

相关主题
● 很惭愧的问一个简单的regression algebra.	● 关于使用adaptive lasso中weight的问题
● 求教一道google的面试题，关于multicollinearity的	● 怎么解决共线性问题
● model selection一般都用什么方法	● 请问：想fit gamma 并同时用lasso的方法做variable selection
● 请问OLS怎样选择feature sets？	● Question about LASSO in R
● 请教一下ridge regression。	● 有80个候选Predictors,怎么从中选<10个
● 问个关于lasso的问题	● 用LASSO选变量后重新fit regression有什么弊端？
● 请问能用glmnet实现weighted least square regularization吗	● 请教simultaneous equation system
● model的predictors之间有multi-colinearity怎么办？	● 多大的data算是large data set？

相关话题的讨论汇总
话题: lasso话题: regression话题: 变量话题: package

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天