第3页 - 关于regression的讨论汇总 - 话题女王

全部话题 - 话题: regression

y*********4
发帖数: 76

来自主题: Statistics版 - LOGISTIC REGRESSION需要DATA正态分布么？

请问做LOGISTIC REGRESSION对数据有什么要求么？需要满足正态分布么?如果不满足怎
么办呢？
还有REGRESSION是不是也是有这样的要求呢？

y*********4
发帖数: 76

来自主题: Statistics版 - LOGISTIC REGRESSION需要DATA正态分布么？

好的。LACK OF FIT是不是一般只对REGRESSION有用？LOGISTIC REGRESSION是不是我们
用ROC CURVE下的面积看啊？

h******e
发帖数: 1791

来自主题: Statistics版 - 问个logistic regression的问题。

用stepwise的方法选择变量，得到两个A和B,都是continuous的，两者之间的
interaction被排除了，如下：
Parameter DF Estimate SE Wald Pr>ChiSq
A 1 0.1297 0.0435 8.9076 0.0028
B 1 -0.3586 0.1732 4.2843 0.0385
可见两者的作用是相反的。
但如果单独用A或B做logistic model，得到：
A的estimate是0.0565，B的estimate是0.1665，两者的作用是同向的。如果单独对二者
做linear regression的话，发现存在正相关性。
请问该如何解释logistic regression的结果。谢谢。

c****o
发帖数: 69

来自主题: Statistics版 - regression要求做normality test么？

做regression的时候，应变量需要符合正态分布么？自变量需要么？？
听说一般常做TRANSFORMATION使得应变量跟自变量符合正态分布，可是为什么要这么做
呢？
如果怎么变化都不能满足正态分布，还可以做regression么？

x**g
发帖数: 807

来自主题: Statistics版 - regression要求做normality test么？

普通的Multiple Regression，应变量需要符合正态分布， Yes. 自变量需要么 No.
如果怎么变化都不能满足正态分布,具体情况具体分析。由专门的Regression可以对付
不同分布特征的数据。

c****o
发帖数: 69

来自主题: Statistics版 - 关于REGRESSION MODEL里的RESIDUL PLOT

如果我的应变量做了LOG TRANSFORMATION之后再做回归，然后得到了REGRESSION MODEL
，那么我检验MODEL的时候画RESIDUAL的图的时候，需要不需要先把LOG后的变量再先
LOG回来呢？？
另外，还有一个问题，如果做NON-LINEAR REGRESSION MODEL,RESIDUAL还需要满足
EQUAL VARIANCE的分布么?

S******y
发帖数: 1123

来自主题: Statistics版 - Logistic regression: binary response: rare event

I am using logistic regression to model rare event, i.e.,
y=0 98.5%
y=1 1.5%
N= 11 million
I am thinking of over-sampling "y=1" observations to increase their
percentage from 1.5% to 10%. Then I will perform logistic regression.
Is this method valid? Will my estimates be biased?
Thanks.

i******a
发帖数: 27

来自主题: Statistics版 - 请推荐 linear regression 的书，有电子版本的么？

在复习以前的统计知识，发现当时用的课本好像是这本
Applied Regression Including Computing and Graphics
by R. D. Cook and Sanford Weisberg
时间久了，现在回头一看，怎么觉得罗里罗嗦的说不清楚，跳来跳去，看得头大。
请版上推荐一些好点的 linear regression 方面的书？有没有哪里可以下载电子版本
就最好了。
多谢！

s*r
发帖数: 2757

来自主题: Statistics版 - 请推荐 linear regression 的书，有电子版本的么？

我看到有些人推荐这个
Harrell, F. E., Jr. (2001). Regression modeling strategies: With
applications to linear models, logistic regression, and survival analysis.
New York: Springer-Verlag.

l**y
发帖数: 130

来自主题: Statistics版 - 在R里面能做cross-sectional regression吗

如题，看了一篇PAPER说用cross-sectional regression fit data，请问R里面什么
package可以做这个，网上没找到。这个和simple linear regression的区别在哪里，
多谢

m**i
发帖数: 37

来自主题: Statistics版 - 关于multi-factor （?） regression理论？

一般linear regression是线性拟合很多个点。
我的情况是，我有很多的两个点（），这一对的两个点连成直线段，这样得到很多线段
。现在做线性拟合，希望得到一条线是这些直线段的regression。
不知道这个叫什么?什么书会提到这样的理论？
谢谢！

c****s
发帖数: 63

来自主题: Statistics版 - Logistic regression，一个validation 的问题

现在完成了logistic regression的model在model-building data set,请问怎样用
validation data来validate我的结果是好的呢？因为logistic regression预测的结果
都是1或0。
还望大侠们多多指教，非常感谢。

p*****o
发帖数: 543

来自主题: Statistics版 - 再问个SAS LOGISTIC REGRESSION的问题。

我用STEPWISE做LOGISTIC REGRESSION之后，把系数存在了PAREST这个DATA SET中，但
是里面有的VARIABLE是MISSING VALUE，因为STEPWISE没有把它选入最后的MODEL。
我的问题是，我用这个PAREST的DATA SET来PREDICT另外一个DATA SET: VALIDATION的
值的时候，比如：
SELECT
1/(1+exp(-(intercept + V.VAR1*E.VAR1 + V.VAR2*E.VAR2))) AS P1
FROM
VALIDATION AS V, PAREST AS E
但是E.VAR2是MISSING VALUE，有没有办法可以把MISSING VALUE在这一步变成0呢？---
-实际上我有很多VAR，不是只有两个，也不知道到底最后哪一个会被选入MODEL。
（因为我是在一个大的MACRO里面同时做STEPWISE还有ENTER METHOD两个REGRESSION，
并且都用来PREDICT VALIDATION DATA SET。所以想用一个统一的公式来写---就想上面
的那个公式，如果

o****o
发帖数: 8077

来自主题: Statistics版 - Order of Independent Variables in Linear Multiple Regression

regress your X2 on the residuals from Y=b0+b1*X1
use the projected X2 from that auxilary regression to your original model
besides, I don't think the ordering matters in OLS, check the linear algebra

tolerable
impression is probably yes, because OLS coeffs and confidence intervals are
computed sequentially, but I am not sure about it at all.

l******o
发帖数: 162

来自主题: Statistics版 - regression problem - go confused

Ran a regression to test the relation between Z and A (after control some
other variables); Z is dependent variable,and A is indep.;
A coefficient is positive and significant;
Then ran another regression to test the relation between Z and B (after
control some other variables); still Z is dep. and A is indep.
B coefficient is negative and significant.
==========================================
Based on the above results, can we safely conclude the relatinship between A
and B??? If we can, the re

c*********d
发帖数: 218

来自主题: Statistics版 - negative binomial regression一问

我的dependent variable 决定要用negative binomial regression。现在我只有1个
continuous independent variable. 请问，还能用negative binomial regression吗
？

s***r
发帖数: 1121

来自主题: Statistics版 - SAS Regression Macro 问题请教 (有包子)

thanks. I sent you 4 baozi (20 dollars). one more question:
I also need to run the regression like this:
b1 = e1
b1 = r1
b1 = f1
b1= e2
b1= r2
b1= f2
b2= e1
b2= r1
b2 f1
...
...
that is, I also need to run univariate regression. Can you help me with the
macro? Many thanks.

s********9
发帖数: 74

来自主题: Statistics版 - Does multivariable logistic regression allow correlated independent variables?

outcome: A
independet variables: B C D E F G H
univariable logistic regression: B C D E F G H all have significant
influence on A.
multivariable logistic regression: Only B has significant influence on A.
Is factor B the only factor should be considered as A's influence factor.

s**********y
发帖数: 38

来自主题: Statistics版 - logistic regression 问题

请问：
1 如何找catigorical variables 的collinearity? such as race (white, black,
latino), sex(male,female), education(college or not college graduated),age(<20, 20-40, >40), income (below, above a level) etc.
2. 我在用logistic regression 时， test Deviance and Pearson Goodness-of-Fit
Statistics, the p-value of Deviance is 0.0037, and the p-value of Pearson is
0.0061. Do these mean that the logistic regression is not fitting the data well? What should I do next?
Thanks.

D*D
发帖数: 236

来自主题: Statistics版 - 菜鸟问个logistic regression的问题

actual proportion is between 0 and 1 but linear regression can give
predictions beyond [0,1]
This is how it was done before logistic regression came into use.

f**********t
发帖数: 1001

来自主题: Statistics版 - ordinary linear regression assume数据是Normal distribution么？

啊噢，不好意思。。。
我没让大家回答，哈哈，我发的题目是ordinary linear regression
不过前面关于simple linear regression的讨论挺赞的。

M****e
发帖数: 178

来自主题: Statistics版 - quantile regression

Anyone could recommend a good textbook for quantile regression?
I need use quantile regression to fit heterogeneous and dependent data, and
have trouble with inference. A couple of papers out there deal with similar
topic, but there are big leaps in their explanation.
I'd like to read a book that explains QR kind of step by step and includes
all the basics and relatively advanced applications.
Thanks a lot.

g********s
发帖数: 69

来自主题: Statistics版 - standardization of coefficients in logistic regression

Sorry cannot type Chinese.
In logistic regression, if we want to compare coefficients across models, we
need to do standardization by dividing the coefficients by
sqrt[(variance of predicted Y scores )+ pi^2/3]
But how to do standardization if the logistic regression model is a mixed-
effect model? Any thoughts? Thanks much!

p******r
发帖数: 1279

来自主题: Statistics版 - regression里面indep var如果是categorical，做dummy var和不做dummy有何区别？

做regression的时候，如果indep var里面有categorical类型的var，比如
salary=experience+edu+error 里edu是categorical变量，值为1 2 3 ，1代表高中
，2代表大学，3代表graduate school。
那我把它当成数字1 2 3然后直接做regression，得出一个beta值
和我把它变成几个dummy var来做one way anova得出几个fix effect的coefficient
这两种做法，在本质上有啥区别呢？感觉除了手法上有区别，其他比如predict或者衡
量edu对salary的effect来看，没啥大区别啊？
还有在SAS里coding的话，如果edu的变量类型一开始就定义为categorical的话，那用
proc glm是不是就不需要事先create dummy varible？
请赐教！！

p******r
发帖数: 1279

来自主题: Statistics版 - 如果dep variable严重skewed，如何做ordinal regression？

请教大家，我碰到一个问题，dependent variable是一个0 1 2 3...9共10个的
categorical variable，现在我要对其做 ordinal regression。
但问题是这个dependent variable 严重skewed，1200多个obs大多集中在0，1，2，在
9那里只有可怜的1个obs，请问这种情况还能做ordinal regression不？如果不能做，
那要怎么办呢？

e****t
发帖数: 766

来自主题: Statistics版 - 如果dep variable严重skewed，如何做ordinal regression？

en, i will use poission regression or negativie binomial regression with
zero inflation.

p******r
发帖数: 1279

来自主题: Statistics版 - 如果dep variable严重skewed，如何做ordinal regression？

有点不明白哦， odinal regression不是基于MLE的吗？
还是说你觉得索性把response variable看出continuous的，然后用OLS regression来
做？
如果ppl不愿意接收他们 highly depressed，then用什么model来做比较好呢？谢谢啊！

a****y
发帖数: 1035

来自主题: Statistics版 - 统计菜鸟请教问题：关于linear regression

有个关于regression的问题请教各位大侠。
我有一组数据，y是continuous response， x是categorical variable（取值为a,b,c
三个值）。
随机抽样1000组数据，其中a有20个，b有100个，c有880个值。现做linear
regression的话，想看看x是否对y有影响。
我的问题是：因为a,b,c的个数差别挺大，需要把这一区别考虑一下吗？如果需要的话
，怎么样将这sample size的effect考虑进去呢？
不知道问题有没有说清楚，先谢谢各位回帖看帖的热心人！！

a****y
发帖数: 1035

来自主题: Statistics版 - 统计菜鸟请教问题：关于linear regression

谢谢你，可否展开具体说说呢？我还是不太理解。
我的理解是one way anova 和 linear regression就是一回事。。。不知道是不是理解
错了。。。
我的主要目的是想test x对y是否有影响，用linear regression是不是就可以解决了？
如果想去掉因为sample size的差异带来的影响，应该怎么做呢？
谢谢！！

s*y
发帖数: 37

来自主题: Statistics版 - 向大牛请教regression的一个问题

那我提问方式错了
我想问的是，从无到有的如何去建立一个risk assessment模型？ regression只是其
中的一步。
如果已经有了dependent variable的data，基本每个人都能去run regression，区别
就是
model本身的好坏而已。
个人感觉，在建model的最开始阶段，对于预期结果方面expert opinion必不可少，
拿信用分数
这个例子来讲，谁都可以用收入，教育背景等等来算一个分数，问题是如何去
evaluate这个分数的
正确性，有没有统计上的方法？我能想到的还就是请一些相关信用卡专家review每个
customer的
profile，然后大致定出一个segmentation，比如说收入少于$10，000的分数不能太
高等等.
这只是我的个人猜测。觉得应该有更系统的方法。

l*********s
发帖数: 5409

来自主题: Statistics版 - 向大牛请教regression的一个问题

prior knowledge is always good to have. You can't lose by having too much
information.
"基本每个人都能去run regression，区别就是model本身的好坏而已。"
Isn't this difference big enough? Knowing which models to choose, and how to validate and present the results takes years of training. Running regression is not even scratching the surface.

，

F****n
发帖数: 3271

来自主题: Statistics版 - A question on one-step vs. Two-step regression

This is a technique normally used to handle collinearity.
If X1 and X2 are highly correlated, the regression coefficients will be
messed up in a model that use them both, i.e. you cannot tell whether the
coefficient on X1 or X2 is the "true" effect.
On the other hand, in your example, you assume (by theory) that X1 always is
the primary effect and will exclusively explain as much original variance
as possible. X2 will only explain the leftovers.

X2).
regression.
one?

T*******I
发帖数: 5138

来自主题: Statistics版 - 比较两个regression模型的系数

I would like to recommend you my paper from which you will definitely find
the method that may help you to reach your purpose. The method is for testing
the differences among threshold models. The first step is to mix all sample
points into one sample, then fit a single model and get a Matrix|regression
coefficients|, since you already have the sub-sample models, and if both
the sub-sample sizes are sufficiently large, then you can construct an
empirical Chi-square statistic to infer if the diff... 阅读全帖

g****n
发帖数: 7494

来自主题: Statistics版 - 包子，请教关于OLS和LAD regression的SAS问题

OLS： ordinary least squares
LAD: least absolute deviations
请问，OLS是不是直接用
proc reg data=QQ;
model f= x y z;
run；
就可以了，还是需要再加什么option呢？
另外，请问如何实现LAD regression呢？
在网上根本找不到sas LAD regression的例子或者说明。
只有两个包子，认真解答的前两名同学就都发了。谢谢

s*i
发帖数: 388

来自主题: Statistics版 - 离散值怎么做logistic regression? (转载)

【以下文字转载自 CS 讨论区】
发信人: sci (ence), 信区: CS
标题: 离散值怎么做logistic regression?
发信站: BBS 未名空间站 (Thu May 12 01:01:01 2011, 美东)
data像这样：
X = (store, zipcode), Y = popularity.
e.g.
(walmart, 10010), popular.
(safeway, 90100), not popular.
(walmart, 10600), popular.
....
etc
try to build a logistic regression model on this dataset.

n*****n
发帖数: 3123

来自主题: Statistics版 - 请教logistic regression

没看明白你前面问的什么意思。
如果你学过generalized linear model, 你就知道怎么回事了。
log(p/(1-p))叫logistic regression.
也可以用其他monotone函数，是glm, 但不叫logistic regression,

b*******l
发帖数: 4

来自主题: Statistics版 - regression continuous dependent variable

如果dependent variable 是 continuous, 除了用linear regression build model 外
。还可以用那种regression build model.

f***a
发帖数: 329

来自主题: Statistics版 - regression continuous dependent variable

data mining里面也有很多regression model
一些splines, GAM, regression tree等等

s*****n
发帖数: 3416

来自主题: Statistics版 - 很惭愧的问一个简单的regression algebra.

what do you mean by "把X和Y都分成对应的两块"?
I don't see how you can do it, unless this is not simple regression. instead
it is multivariate regression.

A*******s
发帖数: 3942

来自主题: Statistics版 - 很惭愧的问一个简单的regression algebra.

is it equivalent to the following problem?
say we have old data and we already have OLS regression estimates. Now new
data (more observations) come in but we don't want to do SVD or get inverse
based on the whole X'X matrix. Instead we would like to use some fast matrix
updating algorithm to update the regression coefficient estimates.
Am I correct?

s****y
发帖数: 297

来自主题: Statistics版 - Question about multiple regression in R

In my dataset, I have several variables (Y, A, B, C...) and one factor.
I'm investigating the regression using the command
lm(Y ~ A+ B + C +...)
but now I want to know the regression coefficients of A, B, and C... for
every factor levels.
The factor have more than 20 levels...
Is there any good way to do this in R?
Thanks in advance!!

J*****n
发帖数: 4859

来自主题: Statistics版 - question about regression in plm (转载)

【以下文字转载自 Economics 讨论区】
发信人: Jadeson (Jadeson), 信区: Economics
标题: question about regression in plm
发信站: BBS 未名空间站 (Fri Jul 15 08:38:24 2011, 美东)
I am running the panel regression through plm and get following errors:
Error in plm.fit(formula, data, model, effect, random.method, inst.method) :
empty model
What does this error mean?

k*******g
发帖数: 13

来自主题: Statistics版 - 一道regression 面试题请教

To predict y(t), we use two candidate vars x1(t) x2(t) separately and get
two linear regression models:
M1: y(t)=b1*x1(t)+b0
M2: y(t)=b1*x2(t)+b0
The coefficient of determination, R-squared, for M1 and M2 are 0.01 and 0.02
If we run a new regression model with both x1 and x2,
M: y(t)=b1*x1(t)+b2*x2(t)+b0
问：lower bound and upper bound of M's R-squared 是多少？这两个extreme case
分别会有什么问题？
多谢指教！

A*******s
发帖数: 3942

来自主题: Statistics版 - 请教一个关于logistic regression参数的问题

你光看distribution of Y是没用的，我们只关心conditional dist of Y。
你这个用GLM就能搞定，如果你这个p是# of events/# of trials, 这个还是最基本的
logistic (Bernoulli/binomial） regression。如果这个p是rates/proportion的话也
可以用beta regression。

pi

A*******s
发帖数: 3942

来自主题: Statistics版 - 请教一个关于logistic regression参数的问题

google "bounded outcome regression". As far as i know, latent variable model
or beta regression can be applied.
i dont think it is about lack of curvature. I bet LZ would see the same
pattern even after he add quadratic terms.
it is ok to use OLS in the case if only the conditional mean of Y is of
interest and variance is nuisance. Whenever u would like to draw inference
from the distribution of Y (significance test, confidence interval), OLS
would fail since it gives wrong estimate of VAR[Y|X] ... 阅读全帖

f******y
发帖数: 2971

来自主题: Statistics版 - PCA and linear regression

suppose two random variables, X and Y, mean of them are very small.
I can get the slope by linear regression lm(Y~X);
I can also do PCA,
data = data.frame(X=X, Y=Y);
princomp(data);
I expected the slope of the first PC vector to be very close to the slope
given by linear regression. I tried it in R, the results are very different.
Anyone can explain?

z*****n
发帖数: 413

来自主题: Statistics版 - PCA and linear regression

I don't think there is specific relationship between PCA and regression.
But covariates of regression can be replaced by PCs, if the X matrix has
strong colinearity. Or PCA is a good way to reduce the number of covariates.
I doubt the meaning of what LOUzhu did. For y=mu + b * x, b can be treated
as a scale of the x vector in the n-space. The first component of x,y should
be vector x+y, and this is non-sense if you haven't standardized your data
before PCA.

i*****c
发帖数: 1322

来自主题: Statistics版 - 问题：fitness in categorical regression

请问在categorical regression中，to see if the model is better than the null
model，如果p-value 很大，怎么justify我用这个model呢？比如我想看某个疾病的
outcome（有就是1，没有是0）和 age, duration, measurement X。用R得到的
regression model, measurement X and outcome are significantly associated (p-
value<0.05)。当R用indices of fit（null,deviance,residual）measure fitness of
the model 时得到p-value 0.7. 但是t-test 和 OR 都显示measurement X 在1组和0
组中显著不同。应该怎么解释呢？

b******s
发帖数: 325

来自主题: Statistics版 - how to interpret these regression coefficients?

Thanks for your confirmation.
What does "additive effects" mean? i am not very clearlly on what you meant
there. Here is my original guess.
"In the interpretation of the case above, we are saying the effects for a
diversion case in VA would be such… Note, to do that interpretation, we
are not saying VA has any diversion case; we instead are implicitly saying,
should VA have a diversion, the effect would have been such … My bold
guess is the variation of VA’s population is memorized/used by ... 阅读全帖

a*****k
发帖数: 704

来自主题: Statistics版 - regression sample size

Hi, when one does time series regression, is there any way to optimally
determine how long the regression time window should be?
Thanks,

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天