第4页 - 关于regression的讨论汇总 - 话题女王

全部话题 - 话题: regression

s*******o
发帖数: 392

来自主题: Statistics版 - logistic regression结果释疑，解读

做logistic regression回国前准备：
1. independent variable：国外金融论坛有关交易词汇的统计：比如“trader，
indicator, long, short, market, crazy”等有160个词汇的每日出现的相对频率，经
过normalization，比如trader 的出现的次数除以当天总的帖子数目。
2. 预测对象：binary variable：明日的volatility是否是top 15%，比如大于60%这
个值。这样1 为大于，0 为小于。
用sas的logistic regression做回国，方法backward，因为不可能160个词汇都有预测
能力，所以希望削减variable数量。
模型stay的标准0.05，
结果如下：
问题，最后sas给我选出来的可预测的词汇有52个之多，如上图，df比较高，可是其结
论是这样的模型fit最好，这里是否有overfit的隐私存在呢，或者其他的一些明显的缺
点呢？谢谢大家了

n*********e
发帖数: 318

来自主题: Statistics版 - 这段R logistic regression code有没有问题？

I am doing an R logistic regression exercise -
My question is - 是否要先从validation set 中删掉 dependent variable, 然后再 run
prediction？
谢谢。
--------------------
library(MASS)
attach(birthwt) #The famous 'low birth weight' data for logistic regression
index <- 1:dim(birthwt)[1]
test<- sample(index, trunc(length(index)/3))
train<-birthwt[-test,]
validation <- birthwt[test,]
logit.1<-glm(low~., data=train, family=binomial(link='logit'))
logit.1
#------------------------------
#这里是否要先从validation set 中删掉 dep... 阅读全帖

a*z
发帖数: 294

来自主题: Statistics版 - Applied logistic regression by HL v2 Q1

Thank you first.
I am trying to learn logistic regression and study HL's book (v2). For the
first question of Chapter 1, I plot the STA's lived pct in each age group vs
med age of each group. The plot looks awful.
But when I fit the STA vs age, coeff of age and constant are both stat
significant.
My real question is: is there any way to explore data which will lead us to
use logistic regression, or most of the time I just blindly apply logistic
to binary response problems?
Thank you again.

c**r
发帖数: 150

来自主题: Statistics版 - 请教：学习logistic regression，generalized linear regression的好书

各位大侠，想问问logistic regression model的好书。感谢~

b**********e
发帖数: 61

来自主题: Statistics版 - lasso regression

use lasso regression for logistic regression, how do decide the sample size
required given the error rate?
many thanks!!!

OE
发帖数: 369

来自主题: Statistics版 - logistic regression on 3 billion records (转载)

【以下文字转载自 Java 讨论区】
发信人: OE (7777777), 信区: Java
标题: logistic regression on 3 billion records
发信站: BBS 未名空间站 (Mon Jan 14 13:24:26 2013, 美东)
最近在Java里用multithreading把logistic regression on 3 billion records作出来
了，原始数据大概1个T, 在 8-core 的 server 上用时不到一个小时。想在简历上吹
吹，有没有大牛给点建议。

x**********0
发帖数: 163

来自主题: Statistics版 - bagging 用于logistic regression because of unbalance data

周末请问一个logistic regression 问题
Y 是 binary variable，但是 unbalance， Y/N=1:8 左右 sample size=45000
我想请教大家都是如何处理unbalance data，特别是在logistic regression里面
我想用bagging，搜了些文章，感觉有用的信息不多
bagging多数用在decision tree等unstable的model上吧，不知道在logistic 里面是否
会显著提升accuracy？
谢谢大家了

x**********0
发帖数: 163

来自主题: Statistics版 - bagging 用于logistic regression because of unbalance data

w*******9
发帖数: 1433

来自主题: Statistics版 - 请教一个logistic regression的问题

Sorry I wish I could help you and I will be very happy if someone could
teach both of us on this point.
My understanding is that R^2 and numerous adjusted/generalized R^2s are all
designed with the desire to assess the goodness of fit by a unitless number
between 0 and 1. They are generally not associated with the proportion of
variation. Only in OLS (with unknown intercept) the R^2 is interpreted as
the proportion of variation explained off by the covariates.
Since there are more popular goodne... 阅读全帖

w**s
发帖数: 26

来自主题: Statistics版 - SAS Regression Excluding OBS i

请教如何run regression by groups excluding obs i in the group to get the
parameter estimates and then get the predicted value for i based on these
estimates. 比如说一组里有20个obs, 那么for the predicted value of obs 1, use
obs 2-20; for obs 2, use obs 1, 3-20. etc.
基本的model是这样的：
proc reg data=test noprint;
model y=x1 x2 x3 /r;
by group;
output out=reg r=residual;
run;
问题是如何run the above regression excluding obs i in the group.
多谢！！！

k******w
发帖数: 269

来自主题: Statistics版 - regression in SAS 请教

我有2000个公司的time series数据
要做个prediction
先是run 2000个independent regression
然后求系数的平均数以及它的p-value，问题是怎么算这个standard error （
cluster by date)呢？
有提到这是stacked regression, SAS里有这个命令么？
谢谢！

y**i
发帖数: 1050

来自主题: Statistics版 - 问个问题关于LOGSITC REGRESSION，急切

打算做LOGSTIC REGRESSION MODEL, 请问一下如果Y 有95% 是0, 5％是1，我用
LOGISTIC REGRESSION 有问题吗？有啥限制吗？如果有限制，我应该如何做呢？
还是说Y的分布多少无所谓呢？
急切等待回复
谢谢

l******e
发帖数: 895

来自主题: Statistics版 - 做logistic regression，cases很少但是predictor很多

谢谢！我也觉得7个太少了。
另一个data set 有14个1，是不是也太少了？
另外顺便抓住牛人问一下，我在自学machine learning中，请问logistic regression,
discriminant analysis, random forest这些方法做classification都有什么优缺点
阿？这几个都是supervised learning，如果response是binary的话，感觉logistic
regression 就能搞定了啊。

g******i
发帖数: 118

来自主题: Statistics版 - R有哪个比较好的做nonparametric regression的package？

mgcv是个很牛x的package, 能做很多很fancy的事情。有需要的话你可以慢慢挖掘。
总的来说，不论是用mgcv里gam，还是用Local linear kernel regression,loess, 他
们不应该有太多差别的。如果后者准确度好很多，十有八九是Kernel的bandwidth设的
比较小，偏向over smooth 罢了。gam里面默认knots之类的也可以调节，应该也能达到
Kernel的效果。
不知道你的prediction是不是in sample prediction. 如果是的话，这种比较是没有意
义的，你永远可以Overfit来提高。gam里面用了cross validation来选择smooth程度，
应该还算比较可信。我就不知道kernel regression有没有用CV。
我个人偏好用gam，用那些spline。视觉效果也会比local linear之类好一些。

model

f***l
发帖数: 117

来自主题: Statistics版 - 如何做ordinal logistic regression的validation？

SAS文档中讲一般logistic regression model的validation方法是用ROC（AUC），但对
ordinal logistic regression model好像不适用，请问该怎么validate这种模型呢？
Thanks!

m******t
发帖数: 273

来自主题: Statistics版 - data prediction by regression or better ways

I am working on data prediction.
Given data of a random variable X and Y, find out how to predict Y by X.
I know how to do it by linear regression, y = k x + b .
But, here, x is always non-negative and y is required to be non-negative.
Sometimes, b is not non-negative so that y < 0.
How to assure that b > 0 and also minimize the prediction error ?
Are there other better ways (not regression) to do the prediction ?
Any help would be appreciated.

m******t
发帖数: 273

来自主题: Statistics版 - data prediction by regression or better ways

c********h
发帖数: 330

来自主题: Statistics版 - data prediction by regression or better ways

Sorry, I didn't notice it's nonnegative.
I don't think the range of x is a big issue. X is given and in many cases
they could be nonnegative, say an indicator.
For your y, it is an issue. If it is counts, use poisson regression. If it
is number of successes, use logistic regression.
If y is continuous and nonnegative, I don't have much experience. But I
think some mixture model can be helpful, modeling 0 and positives separately.

r****5
发帖数: 618

来自主题: Statistics版 - help on one regression question

What proportion of points are within two standard deviations (2sd) and three
standard deviations (3sd) of the regression line?
We have data. How can we calculate sd of the regression and count the
number out of 2sd and 3sd? Can we use Minitab to do it? Thanks.

f*******n
发帖数: 2665

来自主题: Statistics版 - 今天和一个阿三聊segmented logistic regression

这个人是我们公司（银行）在印度的modeling team的Sr Manager。他说他们做的
segmented logistic regression （有时多达50个segmentation）效果比一个logistic
regression好很多，而且：
1。他们的model结果就是一个probability，不用转换成score。（我很难想象银行不用
score而用一个介于0和1之间的小数来表示风险）
2。他们把50个model的结果直接放在一起，没有任何转换。（我觉得不可以，一位每个
model给出的estimated probability所对应的实际risk不相同，需要调整）
3。以上做法都通过了model validation。
此人的职务和在我们公司的经历不会有假。但他的话实在让我惊讶。请有相关经验的人
给评一评。

K***a
发帖数: 72

来自主题: Statistics版 - SAS E-Miner regression model 问题

：）似人非兽谢谢你的回复和意见。
我再试着说说：
1.有什么问题要解决：
假设一个文件里，x是数量，其它还有分组，性别，年龄，收入等，可不可以用
regression model算出y，就是一个更好的x，让它更接近每一个组，每一个性别，里x
的平均值。
2.你同事用的方法：
同事是用SAS program 基本上就是做了上面说的，但是确实非常混乱，有用到平均值，
也有用到比率，但都是简单算数，算出一个i，然后用x乘以i得出y.
3.你打算怎么解决？目前的困难是什么？
我是想用SAS EM model做这件事，不知道用regression可不可以，希望大家能帮忙。

c*******e
发帖数: 150

来自主题: Statistics版 - Regression中噪音项是一个AR(1)，如何做MLE或者其它Fit？

想请教一下版上的各位大牛们，如果
Linear Regression中Noise Term是一个AR(1) process，通常都有什么成熟的算法做
MLE 或者其它方法 fit ？
具体的说，模型可以表示为 Y(t) = X(t) \dot \beta + E(t),
X(t) 和 \beta 都是 K-维的向量，其它都是标量。
t = 1, 2, 3, ..., T 是手头的 sample，
但是和经典的 Linear Regression 不同，E(t) 不是 i.i.d. 的高斯白噪音，可以假定
E(t) 服从一下 model:
E(t) = \rho * E(t-1) + \sigma * Z(t)
\rho 和 \sigma 是 unknown parameter，Z(t) 可以认为是高斯白噪音。
所以全部的 parameters 包括向量 \beta 和标量 \sigma, \rho
最好还是 maximum-likelihood 的方法，这样我可以保留后面做 log-Likelihood
Ratio
Test 的可行性，以便于做 model comparison/s... 阅读全帖

m*****a
发帖数: 306

来自主题: Statistics版 - 求applied linear regression models, 4th edition, by kutner, Nachrshrim, and Net

谢谢，这个和applied linear regression models一样吗？
老师要求的是 regression models，作业题都是那个上面的。

a*******s
发帖数: 324

来自主题: Statistics版 - 问个logistic regression的问题，谢谢！

问个问题： logistic regression 和 linear regression 实质上是一样的么？

quite
with

a*****i
发帖数: 1045

来自主题: Statistics版 - linear regression 关于dummy variable 和interaction term 的问题。(spss)

在spss里面做linear regression, 有一个是原始数据，还有另一个是根据这个原始数
据的dummy variable 由（0，1）构成(低于某个值是0，高于某个值是1），做linear
regression的时候可以不可以直接用这个dummy variable来做，还是需要改成（1，2）
如果牵扯到interaction term, 一个factor是一个continous variable,另一个是dummy
variable (0.1), 在spss里面，需要把两个variable 想乘，得到一个新的variable算
interaction term,那么可以把dummy variable和另外一个continous variable 乘在一
起嘛?因为dummy variable 有一个数字是0.会不会有影响？还是需要原始的数据。
多谢了。

q*********g
发帖数: 3

来自主题: Statistics版 - 问response只能取正值时的least square regression.谢谢

regression的假设条件并没有要符合normal distribution，所以还是可以先用linear
regression

l***e
发帖数: 108

来自主题: Statistics版 - 求推荐稍微advanced且又applied的 linear regression的书

本人数学背景。也学过不少统计的知识，但是现在找industry工作时就发现自己其实对
统计学理解并不是很深刻。各种理论和数学推导都毫无困难能看懂，但是从概念上却没
有很深的理解（以前学了很多基于测度论的math stats，现在发现一点用都没有）。大
部分看过的applied stats的书感觉都好糟糕（以前上课用过weisberg的applied
linear regression，真是超烂的书，就是把公式和结果堆起来，概念和motivation完
全没有）
我想问问有什么书能满足：
1. 有讲到实际问题中regression可能遇到的问题，譬如survey中很多项目空缺是应该
怎么办，或者当有很多feature之间非常correlated时会出现什么问题应该如何解决，
等等。
2. 相对advanced。读者是researcher或者professional这种学过统计理论的。很多
applied/practical的书都太过初级，给本科生看得。
谢谢！

s********1
发帖数: 235

来自主题: Statistics版 - 有什么模型能把linear regression model 和 time series model 合起来做prediction 吗？

有什么模型能把linear regression model 和 time series model 合起来做
prediction 吗？现在有一些数据，是一堆产品，他们有一些自身的属性的数据，如网
上的ratings 等，他们还有time series 的销售数据，每一种产品有month by month
的销售值，现在要用这些数据做predictive modeling, 想到的方法有对自身属性的数
据可以做linear regression, svm 这样的模型预测，对time series 的销售数据，可
以用 time series 的模型预测，有没有什么模型能把这两种模型结合起来，用一个模
型考虑两方面，进行预测？多谢！

t********m
发帖数: 939

来自主题: Statistics版 - 很困惑的一个regression的问题: x is calculated from y

我有这样的一个data structure:
unit netrevenue netprice
600 $109,575 $183
196 $37,390 $191
300 $60,000 $200
现在我们认为unit和netprice之间有linear关系，想regress unit on netprice，但是
netprice在每一行又是通过unit计算出来的，也就是对每一行数据，netprice=
netrevenue/unit。这样的regression还能做吗？我觉得实际上我们在model的是
netrevenue和unit之间的关系，总觉得这样做有点不妥，可是有说不清哪里不妥。敬请
各位大侠指教，谢谢！

b*****e
发帖数: 88

来自主题: Statistics版 - 如果regression relation depends on dependent variable, 应该用什么regression model?

假设有一个hypothesis: 学生的数学成绩和语文成绩负相关如果数学成绩〉=A-，如果
数学成绩一个想法是separate the sample based on 数学成绩（〉=A- 和个subsample 用 truncated regression。这样对吗？还是有更好的方法？
Thanks。

m****o
发帖数: 467

来自主题: Statistics版 - 拜求统计牛人给建议 Semi-parametric regression of binary outcome

我一直都是用SAS LOGISTIC regression做binary outcome 的数据分析但这里有个问
题就是无论是LOGISTIC， GENMOD， GLIMMIX，还是 NLIN， NLMIED，HPNLMOD，都需
要LINK FUNCTION，譬如 LOGIT， PROBIT， LOGLOG， CLOGLOG。貌似SEMI－
PARAMETRIC的分析不需要知道DISTRIBUTION 这是不是就说不需要LINK FUNCTION？要用
啥软件做这样的数据分析，R的NP PACKAGE行不行？SEMI－PARAMETRIC REGRESSION里里
有个SINGLE INDEX MODEL能用不？还是有其他更好的方法？
非常非常非常感谢！！！

g*****o
发帖数: 812

来自主题: DataSciences版 - Regression也属于ML？

用NN来求regression的结果，确实很蛋疼啊

regression
ML

o****p
发帖数: 162

来自主题: DataSciences版 - Regression也属于ML？

多谢你的文献乐。我并不知道这两篇文章，我不过是从统计和机器学习一般教科书的观
点来说NN。我就是简单评论下你说的“搞统计的不把NN看作regression”。我以前的一
个小头就是统计博士，对NN regression很了解。不过你说的对，搞机器学习，懂更多
SP对于理论上的深入理解当然最好，但从实际工程和应用的角度，其实知道一点就够了。
另外关于DL的观点，我对DL看法就是，DL没那么悬，只是在几十年都存在的NN技术上，
加进乐一些特定的思想，比如convolutional NN就是一个关键技术，最早应用在特定的
机器视觉、图像、视频学习分析问题上的。其中关键的思想就是分层次地做特征学习和
抽取，最后达到更抽象的特征编码。这种思想并不是DL特有的特征，这个思想基本在任
何实际大型的机器学习系统开发中，都是一个基本的设计原则。所以我说，DL没那么玄
乎，其核心技术NN本质上，也就是一个条件期望估算子的回归算法而已。

Space

d******e
发帖数: 7844

来自主题: DataSciences版 - Regression也属于ML？

你误解regresion的含义了吧。
Regression就是分析response和predictor之间的关系。
NN不过是a class of nonparametric functions罢了，拿来做regression analysis很
正常。

s****r
发帖数: 5546

来自主题: USANews版 - Regressive Lefts can keep the hope

This is a living proof that white trash can remain a Regressive Left even as
they get as old as 66 years old.

t****n
发帖数: 313

来自主题: JobHunting版 - 为啥CS要把regression,statistics强奸成machine learning？

因为本来就不是。现在还谈什么regression, generative story那是老学究。

c*********l
发帖数: 926

来自主题: JobHunting版 - 为啥CS要把regression,statistics强奸成machine learning？

machine learning权威andrew Ng的课就是从regression开始教的。搞来搞去最后变成
numerical optimization了。

c*********l
发帖数: 926

来自主题: JobHunting版 - 为啥CS要把regression,statistics强奸成machine learning？

machine learning权威andrew Ng的课就是从regression开始教的。搞来搞去最后变成
numerical optimization了。

x**********z
发帖数: 131

来自主题: JobHunting版 - 为啥CS要把regression,statistics强奸成machine learning？

侧重点不一样。当初统计中学regression，主要学最小二乘，然后针对不可求逆矩阵的
情况又学了广义逆矩阵一门课，好几百页一本书啊。搞机器学习的几个会学广义逆。

t****n
发帖数: 313

来自主题: JobHunting版 - 为啥CS要把regression,statistics强奸成machine learning？

当然都是从regression开始教。

J*****a
发帖数: 4262

来自主题: JobHunting版 - 为啥CS要把regression,statistics强奸成machine learning？

regression只是其中的一类问题而已，还有更常用的pattern recognition,
classification, clustering等问题根本不就是传统统计研究的问题

发帖数: 1

来自主题: JobHunting版 - 请问logistic regression有哪些hyperparameter (转载)

【以下文字转载自 Programming 讨论区】
发信人: MoonChild (), 信区: Programming
标题: 请问logistic regression有哪些hyperparameter
发信站: BBS 未名空间站 (Mon Feb 6 16:39:31 2017, 美东)
网上都查不到

c****u
发帖数: 243

来自主题: JobHunting版 - 请问logistic regression有哪些hyperparameter (转载)

logistic regression竟然跑到programming版问?

I*********y
发帖数: 185

来自主题: JobMarket版 - Jr. Regression Test Engineer opening in BayArea CA

硅谷著名网络公司招聘Junior-level Regression Testing Engineer. 要求ee/cs本科
或硕士学历，工作经验0-3 years。如果有人感兴趣，可以给我份简历。
需要能做 perl scripting，会用基本的Unix/Linux
需要较好的了解network protocols: tcp/ip, ospf, isis, bgp,mpls。如果想通过面
试，了解程度应该接近或达到CCNP，或JNCIP的程度。不需要所有协议都懂，但至少其
中两三个懂得足够深。如果从来没听说过网络协议，如今想现学现卖的恐怕不行，除非
您记忆力理解力超乎常人。
聪明肯学，英语沟通流畅，be motivated to work。
公司赞助H1，绿卡，待遇较好。
经理的hiring style较保守，如果背景差得太远就不必试了，就算我递了也没戏。背景
接近的可以pm我信箱。另外，一旦简历递出，可能当天或者第二天就会有phone screen，不会有什么时间让你可以很从容地准备一说。
再次重申，这个是Junior level opening，如果你特别有经验，对工作的期待值很

s*******r
发帖数: 130

来自主题: Immigration版 - Notice of File Transfer for Visa Regressed Cases

EB1A的I485在9月7号面试后，移民官也没有说是批准了还是怎么了，只说他有120天的
时间考虑。今天收到了信，标题是Notice of File Transfer for Visa Regressed
Cases。
请过来人说说怎么回事，不胜感谢！

M********7
发帖数: 1841

来自主题: SanFrancisco版 - 请问有人能辅导linear regression analysis?

正在上的数学统计课，觉得好难啊，想找个TUTOR，每周末辅导2-3个小时，价钱站内商
量。
课本是这个：
http://www.amazon.com/Introduction-Linear-Regression-Analysis-b
谢谢。

D*****r
发帖数: 183

来自主题: CS版 - 谁有Mario Martin 的SVM Incremental Regression 包

急用，或者别的SVM incremental regression
包也行。谢谢

k**m
发帖数: 222

来自主题: CS版 - 请问什么是 regression test suite?

经常在跟编译器相关的文章中读到regression test suite，它和一般的test suite有什
么不同?
Thanks

w*******g
发帖数: 9932

来自主题: CS版 - 请问什么是 regression test suite?

regression tests are some tests that you needs to pass after each version
of modification.

什

K****n
发帖数: 5970

来自主题: CS版 - probit regression一问 (转载)

【以下文字转载自 Computation 讨论区】
发信人: KeeVan (Kevin), 信区: Computation
标题: probit regression一问
发信站: BBS 未名空间站 (Fri Aug 22 21:53:32 2008)
请问有没有现成的教材把maximum likelihood的导数求出来的? 我想对一下,网上居然
google不出来... 我不太放心matlab里的glm方程之类的,那个training的时候震荡比较
大.
另外如果对probit方程的参数设一个gaussian prior,然后求bayesian的
P(data)=Integrate(P(data|parameter)*P(parameter),over parameter)
好像这里用probit方程作P(data|parameter),用Gaussian作P(parameter),在optimize
bayeisan likelihood的时候比较好算?不知道有没有人已经算过?又google不出来...
谢谢!

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天