y*********4 发帖数: 76 | 1 请问做LOGISTIC REGRESSION对数据有什么要求么?需要满足正态分布么?如果不满足怎
么办呢?
还有REGRESSION是不是也是有这样的要求呢? |
|
y*********4 发帖数: 76 | 2 好的。LACK OF FIT是不是一般只对REGRESSION有用?LOGISTIC REGRESSION是不是我们
用ROC CURVE下的面积看啊? |
|
h******e 发帖数: 1791 | 3 用stepwise的方法选择变量,得到两个A和B,都是continuous的,两者之间的
interaction被排除了,如下:
Parameter DF Estimate SE Wald Pr>ChiSq
A 1 0.1297 0.0435 8.9076 0.0028
B 1 -0.3586 0.1732 4.2843 0.0385
可见两者的作用是相反的。
但如果单独用A或B做logistic model,得到:
A的estimate是0.0565,B的estimate是0.1665,两者的作用是同向的。如果单独对二者
做linear regression的话,发现存在正相关性。
请问该如何解释logistic regression的结果。谢谢。 |
|
c****o 发帖数: 69 | 4 做regression的时候,应变量需要符合正态分布么?自变量需要么??
听说一般常做TRANSFORMATION使得应变量跟自变量符合正态分布,可是为什么要这么做
呢?
如果怎么变化都不能满足正态分布,还可以做regression么? |
|
x**g 发帖数: 807 | 5 普通的Multiple Regression,应变量需要符合正态分布, Yes. 自变量需要么 No.
如果怎么变化都不能满足正态分布,具体情况具体分析。由专门的Regression可以对付
不同分布特征的数据。 |
|
c****o 发帖数: 69 | 6 如果我的应变量做了LOG TRANSFORMATION之后再做回归,然后得到了REGRESSION MODEL
,那么我检验MODEL的时候画RESIDUAL的图的时候,需要不需要先把LOG后的变量再先
LOG回来呢??
另外,还有一个问题,如果做NON-LINEAR REGRESSION MODEL,RESIDUAL还需要满足
EQUAL VARIANCE的分布么? |
|
S******y 发帖数: 1123 | 7 I am using logistic regression to model rare event, i.e.,
y=0 98.5%
y=1 1.5%
N= 11 million
I am thinking of over-sampling "y=1" observations to increase their
percentage from 1.5% to 10%. Then I will perform logistic regression.
Is this method valid? Will my estimates be biased?
Thanks. |
|
i******a 发帖数: 27 | 8 在复习以前的统计知识,发现当时用的课本好像是这本
Applied Regression Including Computing and Graphics
by R. D. Cook and Sanford Weisberg
时间久了,现在回头一看,怎么觉得罗里罗嗦的说不清楚,跳来跳去,看得头大。
请版上推荐一些好点的 linear regression 方面的书?有没有哪里可以下载电子版本
就最好了。
多谢! |
|
s*r 发帖数: 2757 | 9 我看到有些人推荐这个
Harrell, F. E., Jr. (2001). Regression modeling strategies: With
applications to linear models, logistic regression, and survival analysis.
New York: Springer-Verlag. |
|
l**y 发帖数: 130 | 10 如题,看了一篇PAPER说用cross-sectional regression fit data,请问R里面什么
package可以做这个,网上没找到。这个和simple linear regression的区别在哪里,
多谢 |
|
m**i 发帖数: 37 | 11 一般linear regression是线性拟合很多个点。
我的情况是,我有很多的两个点(),这一对的两个点连成直线段,这样得到很多线段
。现在做线性拟合,希望得到一条线是这些直线段的regression。
不知道这个叫什么?什么书会提到这样的理论?
谢谢! |
|
c****s 发帖数: 63 | 12 现在完成了logistic regression的model在model-building data set,请问怎样用
validation data来validate我的结果是好的呢?因为logistic regression预测的结果
都是1或0。
还望大侠们多多指教,非常感谢。 |
|
p*****o 发帖数: 543 | 13 我用STEPWISE做LOGISTIC REGRESSION之后,把系数存在了PAREST这个DATA SET中,但
是里面有的VARIABLE是MISSING VALUE,因为STEPWISE没有把它选入最后的MODEL。
我的问题是,我用这个PAREST的DATA SET来PREDICT另外一个DATA SET: VALIDATION的
值的时候,比如:
SELECT
1/(1+exp(-(intercept + V.VAR1*E.VAR1 + V.VAR2*E.VAR2))) AS P1
FROM
VALIDATION AS V, PAREST AS E
但是E.VAR2是MISSING VALUE,有没有办法可以把MISSING VALUE在这一步变成0呢?---
-实际上我有很多VAR,不是只有两个,也不知道到底最后哪一个会被选入MODEL。
(因为我是在一个大的MACRO里面同时做STEPWISE还有ENTER METHOD两个REGRESSION,
并且都用来PREDICT VALIDATION DATA SET。所以想用一个统一的公式来写---就想上面
的那个公式,如果 |
|
o****o 发帖数: 8077 | 14 regress your X2 on the residuals from Y=b0+b1*X1
use the projected X2 from that auxilary regression to your original model
besides, I don't think the ordering matters in OLS, check the linear algebra
tolerable
impression is probably yes, because OLS coeffs and confidence intervals are
computed sequentially, but I am not sure about it at all. |
|
l******o 发帖数: 162 | 15 Ran a regression to test the relation between Z and A (after control some
other variables); Z is dependent variable,and A is indep.;
A coefficient is positive and significant;
Then ran another regression to test the relation between Z and B (after
control some other variables); still Z is dep. and A is indep.
B coefficient is negative and significant.
==========================================
Based on the above results, can we safely conclude the relatinship between A
and B??? If we can, the re |
|
c*********d 发帖数: 218 | 16 我的dependent variable 决定要用negative binomial regression。现在我只有1个
continuous independent variable. 请问,还能用negative binomial regression吗
? |
|
s***r 发帖数: 1121 | 17 thanks. I sent you 4 baozi (20 dollars). one more question:
I also need to run the regression like this:
b1 = e1
b1 = r1
b1 = f1
b1= e2
b1= r2
b1= f2
b2= e1
b2= r1
b2 f1
...
...
that is, I also need to run univariate regression. Can you help me with the
macro? Many thanks. |
|
s********9 发帖数: 74 | 18 outcome: A
independet variables: B C D E F G H
univariable logistic regression: B C D E F G H all have significant
influence on A.
multivariable logistic regression: Only B has significant influence on A.
Is factor B the only factor should be considered as A's influence factor. |
|
s**********y 发帖数: 38 | 19 请问:
1 如何找catigorical variables 的collinearity? such as race (white, black,
latino), sex(male,female), education(college or not college graduated),age(<20, 20-40, >40), income (below, above a level) etc.
2. 我在用logistic regression 时, test Deviance and Pearson Goodness-of-Fit
Statistics, the p-value of Deviance is 0.0037, and the p-value of Pearson is
0.0061. Do these mean that the logistic regression is not fitting the data well? What should I do next?
Thanks. |
|
D*D 发帖数: 236 | 20 actual proportion is between 0 and 1 but linear regression can give
predictions beyond [0,1]
This is how it was done before logistic regression came into use. |
|
f**********t 发帖数: 1001 | 21 啊噢,不好意思。。。
我没让大家回答,哈哈,我发的题目是ordinary linear regression
不过前面关于simple linear regression的讨论挺赞的。 |
|
M****e 发帖数: 178 | 22 Anyone could recommend a good textbook for quantile regression?
I need use quantile regression to fit heterogeneous and dependent data, and
have trouble with inference. A couple of papers out there deal with similar
topic, but there are big leaps in their explanation.
I'd like to read a book that explains QR kind of step by step and includes
all the basics and relatively advanced applications.
Thanks a lot. |
|
g********s 发帖数: 69 | 23 Sorry cannot type Chinese.
In logistic regression, if we want to compare coefficients across models, we
need to do standardization by dividing the coefficients by
sqrt[(variance of predicted Y scores )+ pi^2/3]
But how to do standardization if the logistic regression model is a mixed-
effect model? Any thoughts? Thanks much! |
|
p******r 发帖数: 1279 | 24 做regression的时候,如果indep var里面有categorical类型的var,比如
salary=experience+edu+error 里edu是categorical变量,值为1 2 3 ,1代表高中
,2代表大学,3代表graduate school。
那我把它当成数字1 2 3然后直接做regression,得出一个beta值
和我把它变成几个dummy var来做one way anova得出几个fix effect的coefficient
这两种做法,在本质上有啥区别呢? 感觉除了手法上有区别,其他比如predict或者衡
量edu对salary的effect来看,没啥大区别啊?
还有在SAS里coding的话,如果edu的变量类型一开始就定义为categorical的话,那用
proc glm是不是就不需要事先create dummy varible?
请赐教!! |
|
p******r 发帖数: 1279 | 25 请教大家,我碰到一个问题,dependent variable是一个0 1 2 3...9共10个的
categorical variable,现在我要对其做 ordinal regression。
但问题是这个dependent variable 严重skewed,1200多个obs大多集中在0,1,2, 在
9那里只有可怜的1个obs,请问这种情况还能做ordinal regression不?如果不能做,
那要怎么办呢? |
|
e****t 发帖数: 766 | 26 en, i will use poission regression or negativie binomial regression with
zero inflation. |
|
p******r 发帖数: 1279 | 27 有点不明白哦, odinal regression不是基于MLE的吗?
还是说你觉得索性把response variable看出continuous的,然后用OLS regression来
做?
如果ppl不愿意接收他们 highly depressed,then用什么model来做比较好呢? 谢谢啊! |
|
a****y 发帖数: 1035 | 28 有个关于regression的问题请教各位大侠。
我有一组数据,y是continuous response, x是categorical variable(取值为a,b,c
三个值)。
随机抽样1000组数据,其中a有20个,b有100个,c有880个值。 现做linear
regression的话,想看看x是否对y有影响。
我的问题是:因为a,b,c的个数差别挺大,需要把这一区别考虑一下吗?如果需要的话
,怎么样将这sample size的effect考虑进去呢?
不知道问题有没有说清楚,先谢谢各位回帖看帖的热心人!! |
|
a****y 发帖数: 1035 | 29 谢谢你,可否展开具体说说呢?我还是不太理解。
我的理解是one way anova 和 linear regression就是一回事。。。不知道是不是理解
错了。。。
我的主要目的是想test x对y是否有影响,用linear regression是不是就可以解决了?
如果想去掉因为sample size的差异带来的影响,应该怎么做呢?
谢谢!! |
|
s*y 发帖数: 37 | 30 那我提问方式错了
我想问的是, 从无到有的如何去建立一个risk assessment模型? regression只是其
中的一步。
如果已经有了dependent variable的data, 基本每个人都能去run regression, 区别
就是
model本身的好坏而已。
个人感觉, 在建model的最开始阶段, 对于预期结果方面expert opinion必不可少,
拿信用分数
这个例子来讲, 谁都可以用收入, 教育背景等等来算一个分数, 问题是如何去
evaluate这个分数的
正确性, 有没有统计上的方法? 我能想到的还就是请一些相关信用卡专家review每个
customer的
profile, 然后大致定出一个segmentation, 比如说收入少于$10,000的分数不能太
高等等.
这只是我的个人猜测。 觉得应该有更系统的方法。 |
|
l*********s 发帖数: 5409 | 31 prior knowledge is always good to have. You can't lose by having too much
information.
"基本每个人都能去run regression, 区别就是model本身的好坏而已。"
Isn't this difference big enough? Knowing which models to choose, and how to validate and present the results takes years of training. Running regression is not even scratching the surface.
, |
|
F****n 发帖数: 3271 | 32 This is a technique normally used to handle collinearity.
If X1 and X2 are highly correlated, the regression coefficients will be
messed up in a model that use them both, i.e. you cannot tell whether the
coefficient on X1 or X2 is the "true" effect.
On the other hand, in your example, you assume (by theory) that X1 always is
the primary effect and will exclusively explain as much original variance
as possible. X2 will only explain the leftovers.
X2).
regression.
one? |
|
T*******I 发帖数: 5138 | 33 I would like to recommend you my paper from which you will definitely find
the method that may help you to reach your purpose. The method is for testing
the differences among threshold models. The first step is to mix all sample
points into one sample, then fit a single model and get a Matrix|regression
coefficients|, since you already have the sub-sample models, and if both
the sub-sample sizes are sufficiently large, then you can construct an
empirical Chi-square statistic to infer if the diff... 阅读全帖 |
|
g****n 发帖数: 7494 | 34 OLS: ordinary least squares
LAD: least absolute deviations
请问,OLS是不是直接用
proc reg data=QQ;
model f= x y z;
run;
就可以了,还是需要再加什么option呢?
另外,请问如何实现LAD regression呢?
在网上根本找不到sas LAD regression的例子或者说明。
只有两个包子,认真解答的前两名同学就都发了。谢谢 |
|
s*i 发帖数: 388 | 35 【 以下文字转载自 CS 讨论区 】
发信人: sci (ence), 信区: CS
标 题: 离散值怎么做logistic regression?
发信站: BBS 未名空间站 (Thu May 12 01:01:01 2011, 美东)
data像这样:
X = (store, zipcode), Y = popularity.
e.g.
(walmart, 10010), popular.
(safeway, 90100), not popular.
(walmart, 10600), popular.
....
etc
try to build a logistic regression model on this dataset. |
|
n*****n 发帖数: 3123 | 36 没看明白你前面问的什么意思。
如果你学过generalized linear model, 你就知道怎么回事了。
log(p/(1-p))叫logistic regression.
也可以用其他monotone函数,是glm, 但不叫logistic regression, |
|
b*******l 发帖数: 4 | 37 如果dependent variable 是 continuous, 除了用linear regression build model 外
。 还可以用那种regression build model. |
|
f***a 发帖数: 329 | 38 data mining里面也有很多regression model
一些splines, GAM, regression tree等等 |
|
s*****n 发帖数: 3416 | 39 what do you mean by "把X和Y都分成对应的两块"?
I don't see how you can do it, unless this is not simple regression. instead
it is multivariate regression. |
|
A*******s 发帖数: 3942 | 40 is it equivalent to the following problem?
say we have old data and we already have OLS regression estimates. Now new
data (more observations) come in but we don't want to do SVD or get inverse
based on the whole X'X matrix. Instead we would like to use some fast matrix
updating algorithm to update the regression coefficient estimates.
Am I correct? |
|
s****y 发帖数: 297 | 41 In my dataset, I have several variables (Y, A, B, C...) and one factor.
I'm investigating the regression using the command
lm(Y ~ A+ B + C +...)
but now I want to know the regression coefficients of A, B, and C... for
every factor levels.
The factor have more than 20 levels...
Is there any good way to do this in R?
Thanks in advance!! |
|
J*****n 发帖数: 4859 | 42 【 以下文字转载自 Economics 讨论区 】
发信人: Jadeson (Jadeson), 信区: Economics
标 题: question about regression in plm
发信站: BBS 未名空间站 (Fri Jul 15 08:38:24 2011, 美东)
I am running the panel regression through plm and get following errors:
Error in plm.fit(formula, data, model, effect, random.method, inst.method) :
empty model
What does this error mean? |
|
k*******g 发帖数: 13 | 43 To predict y(t), we use two candidate vars x1(t) x2(t) separately and get
two linear regression models:
M1: y(t)=b1*x1(t)+b0
M2: y(t)=b1*x2(t)+b0
The coefficient of determination, R-squared, for M1 and M2 are 0.01 and 0.02
If we run a new regression model with both x1 and x2,
M: y(t)=b1*x1(t)+b2*x2(t)+b0
问:lower bound and upper bound of M's R-squared 是多少?这两个extreme case
分别会有什么问题?
多谢指教! |
|
A*******s 发帖数: 3942 | 44 你光看distribution of Y是没用的,我们只关心conditional dist of Y。
你这个用GLM就能搞定,如果你这个p是# of events/# of trials, 这个还是最基本的
logistic (Bernoulli/binomial) regression。如果这个p是rates/proportion的话也
可以用beta regression。
pi |
|
A*******s 发帖数: 3942 | 45 google "bounded outcome regression". As far as i know, latent variable model
or beta regression can be applied.
i dont think it is about lack of curvature. I bet LZ would see the same
pattern even after he add quadratic terms.
it is ok to use OLS in the case if only the conditional mean of Y is of
interest and variance is nuisance. Whenever u would like to draw inference
from the distribution of Y (significance test, confidence interval), OLS
would fail since it gives wrong estimate of VAR[Y|X] ... 阅读全帖 |
|
f******y 发帖数: 2971 | 46 suppose two random variables, X and Y, mean of them are very small.
I can get the slope by linear regression lm(Y~X);
I can also do PCA,
data = data.frame(X=X, Y=Y);
princomp(data);
I expected the slope of the first PC vector to be very close to the slope
given by linear regression. I tried it in R, the results are very different.
Anyone can explain? |
|
z*****n 发帖数: 413 | 47 I don't think there is specific relationship between PCA and regression.
But covariates of regression can be replaced by PCs, if the X matrix has
strong colinearity. Or PCA is a good way to reduce the number of covariates.
I doubt the meaning of what LOUzhu did. For y=mu + b * x, b can be treated
as a scale of the x vector in the n-space. The first component of x,y should
be vector x+y, and this is non-sense if you haven't standardized your data
before PCA. |
|
i*****c 发帖数: 1322 | 48 请问在categorical regression中,to see if the model is better than the null
model,如果p-value 很大,怎么justify我用这个model呢?比如我想看某个疾病的
outcome(有就是1,没有是0)和 age, duration, measurement X。用R得到的
regression model, measurement X and outcome are significantly associated (p-
value<0.05)。当R用indices of fit(null,deviance,residual)measure fitness of
the model 时得到p-value 0.7. 但是t-test 和 OR 都显示measurement X 在1组和0
组中显著不同。应该怎么解释呢? |
|
b******s 发帖数: 325 | 49 Thanks for your confirmation.
What does "additive effects" mean? i am not very clearlly on what you meant
there. Here is my original guess.
"In the interpretation of the case above, we are saying the effects for a
diversion case in VA would be such… Note, to do that interpretation, we
are not saying VA has any diversion case; we instead are implicitly saying,
should VA have a diversion, the effect would have been such … My bold
guess is the variation of VA’s population is memorized/used by ... 阅读全帖 |
|
a*****k 发帖数: 704 | 50 Hi, when one does time series regression, is there any way to optimally
determine how long the regression time window should be?
Thanks, |
|