由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - bagging 用于logistic regression because of unbalance data
相关主题
讨论个问题,classification 的label 非常不平均lending club的notes 数据 (转载)
how do you deal with sparse data?[新手求救]怎样输出logistic regression的结果?
紧急求助一个LOGISTIC REGRESSION 问题.保险公司technical interview 会怎么问?
R-square of logistic regression谁给说说marketing analysis主要做什么
How to test the difference between two C statistics (want the Pclassification 问题 求教!!
统计专业找银行工作,需要有哪些金融的知识random forest里面为什么是"可放回"的resample呢?
请教:如何做regression model的validation?如何做sampling
如何做ordinal logistic regression的validation?请问logit model中为什么没有error项?
相关话题的讨论汇总
话题: bagging话题: logistic话题: regression话题: unbalance话题: data
进入Statistics版参与讨论
1 (共1页)
x**********0
发帖数: 163
1
周末请问一个logistic regression 问题
Y 是 binary variable, 但是 unbalance, Y/N=1:8 左右 sample size=45000
我想请教大家都是如何处理unbalance data, 特别是在logistic regression里面
我想用bagging,搜了些文章,感觉有用的信息不多
bagging多数用在decision tree等unstable的model上吧,不知道在logistic 里面是否
会显著提升accuracy?
谢谢大家了
c**d
发帖数: 104
2
Don't know why you called that unbalanced. It is your event rate and ~10% is
normal.
Do you want to re-sampling and afraid of your event rate too low in your
sample data?

【在 x**********0 的大作中提到】
: 周末请问一个logistic regression 问题
: Y 是 binary variable, 但是 unbalance, Y/N=1:8 左右 sample size=45000
: 我想请教大家都是如何处理unbalance data, 特别是在logistic regression里面
: 我想用bagging,搜了些文章,感觉有用的信息不多
: bagging多数用在decision tree等unstable的model上吧,不知道在logistic 里面是否
: 会显著提升accuracy?
: 谢谢大家了

x**********0
发帖数: 163
3
Yes, I want to use bagging as a resample method
I don't know if it is appropriate.
what do you think?
x**********0
发帖数: 163
4
yes, I want to use bagging as a resample method.
I just wonder if it is appropriate in logistic regression.
What do you think?
I*****a
发帖数: 5425
5
Use of bagging usually aims to reduce prediction variance, which is the reas
on that it has been widely used in unstable methods like decision trees. If
by "accuracy" you meant unbiasedness, then it won't work. If you meant by
some criteria like mean squared errors, then it might work.
Further, do you think your logistic regression model will be an unstable one
? I.e., does it overfit a lot ? Not likely since you have so many observati
ons, unless you have a very big model.

【在 x**********0 的大作中提到】
: 周末请问一个logistic regression 问题
: Y 是 binary variable, 但是 unbalance, Y/N=1:8 左右 sample size=45000
: 我想请教大家都是如何处理unbalance data, 特别是在logistic regression里面
: 我想用bagging,搜了些文章,感觉有用的信息不多
: bagging多数用在decision tree等unstable的model上吧,不知道在logistic 里面是否
: 会显著提升accuracy?
: 谢谢大家了

x**********0
发帖数: 163
6
其实我也是这样想的,bagging多用于decision tree这样unstable的classifier,
那如果logistic regression有很严重的unbalanced data的话,都是怎么处理呢?
undersample,oversample?
s*********e
发帖数: 1051
7
first of all, do not misuse the term "unbalanced".
secondly, if the event rate is too low, then the logistic regression won't
work anyway regardless of bagging or not. but in your case, 1/8 is not bad
at all.
thirdly, in case of extremely rare event, you should consider non-parametric
models such as tree-based / rule-based / nnet either directly or indirectly
with a 2-stage approach.
x**********0
发帖数: 163
8
我大约也明白1:8也还算好,就是想说,如果真的有严重的unbalanced data,
bagging能否用于logistic regression, 我基本明白你们的意思,
就是说average 10000 bad prediction,得出的结果还是bad的
是吧,谢谢啦
x**********0
发帖数: 163
9
周末请问一个logistic regression 问题
Y 是 binary variable, 但是 unbalance, Y/N=1:8 左右 sample size=45000
我想请教大家都是如何处理unbalance data, 特别是在logistic regression里面
我想用bagging,搜了些文章,感觉有用的信息不多
bagging多数用在decision tree等unstable的model上吧,不知道在logistic 里面是否
会显著提升accuracy?
谢谢大家了
c**d
发帖数: 104
10
Don't know why you called that unbalanced. It is your event rate and ~10% is
normal.
Do you want to re-sampling and afraid of your event rate too low in your
sample data?

【在 x**********0 的大作中提到】
: 周末请问一个logistic regression 问题
: Y 是 binary variable, 但是 unbalance, Y/N=1:8 左右 sample size=45000
: 我想请教大家都是如何处理unbalance data, 特别是在logistic regression里面
: 我想用bagging,搜了些文章,感觉有用的信息不多
: bagging多数用在decision tree等unstable的model上吧,不知道在logistic 里面是否
: 会显著提升accuracy?
: 谢谢大家了

相关主题
统计专业找银行工作,需要有哪些金融的知识lending club的notes 数据 (转载)
请教:如何做regression model的validation?[新手求救]怎样输出logistic regression的结果?
如何做ordinal logistic regression的validation?保险公司technical interview 会怎么问?
进入Statistics版参与讨论
x**********0
发帖数: 163
11
Yes, I want to use bagging as a resample method
I don't know if it is appropriate.
what do you think?
x**********0
发帖数: 163
12
yes, I want to use bagging as a resample method.
I just wonder if it is appropriate in logistic regression.
What do you think?
I*****a
发帖数: 5425
13
Use of bagging usually aims to reduce prediction variance, which is the reas
on that it has been widely used in unstable methods like decision trees. If
by "accuracy" you meant unbiasedness, then it won't work. If you meant by
some criteria like mean squared errors, then it might work.
Further, do you think your logistic regression model will be an unstable one
? I.e., does it overfit a lot ? Not likely since you have so many observati
ons, unless you have a very big model.

【在 x**********0 的大作中提到】
: 周末请问一个logistic regression 问题
: Y 是 binary variable, 但是 unbalance, Y/N=1:8 左右 sample size=45000
: 我想请教大家都是如何处理unbalance data, 特别是在logistic regression里面
: 我想用bagging,搜了些文章,感觉有用的信息不多
: bagging多数用在decision tree等unstable的model上吧,不知道在logistic 里面是否
: 会显著提升accuracy?
: 谢谢大家了

x**********0
发帖数: 163
14
其实我也是这样想的,bagging多用于decision tree这样unstable的classifier,
那如果logistic regression有很严重的unbalanced data的话,都是怎么处理呢?
undersample,oversample?
s*********e
发帖数: 1051
15
first of all, do not misuse the term "unbalanced".
secondly, if the event rate is too low, then the logistic regression won't
work anyway regardless of bagging or not. but in your case, 1/8 is not bad
at all.
thirdly, in case of extremely rare event, you should consider non-parametric
models such as tree-based / rule-based / nnet either directly or indirectly
with a 2-stage approach.
x**********0
发帖数: 163
16
我大约也明白1:8也还算好,就是想说,如果真的有严重的unbalanced data,
bagging能否用于logistic regression, 我基本明白你们的意思,
就是说average 10000 bad prediction,得出的结果还是bad的
是吧,谢谢啦
i*********e
发帖数: 783
17
Does bagging mean we use the average of the coefficients of the n models?
My current data is about 1:20. What shall I do?
Logistic regression does not work ?
Then which method will work? Case-based reasoning? Neural Network? Support
Vector Machine? When we use these model, should we select all the data, or
make the negative dependent variables' records # is the same as the positive
number?
A*******s
发帖数: 3942
18
这篇文章解释了给定misclassification rate的情况下AUC的分布
http://books.nips.cc/papers/files/nips16/NIPS2003_AA40.pdf
可以看到AUC's std会因为rare event而变大
我不知道有没有类似的讨论Bernoulli likelihood和AUC的paper,不过直觉上
Bernoulli likelihood其实就是error rate的一个continuous and differentiable
approximation, 也许关系会比较类似,这也和我记忆里,做weight adjustment或者调
整cost function一般有助于提高AUC的经验(不过经验很有限)相符。
1 (共1页)
进入Statistics版参与讨论
相关主题
请问logit model中为什么没有error项?How to test the difference between two C statistics (want the P
proc logistic: how to build 2 X 2 classification table统计专业找银行工作,需要有哪些金融的知识
请教,ROC曲线可以应用于那些研究?请教:如何做regression model的validation?
统计菜鸟请教问题:关于linear regression如何做ordinal logistic regression的validation?
讨论个问题,classification 的label 非常不平均lending club的notes 数据 (转载)
how do you deal with sparse data?[新手求救]怎样输出logistic regression的结果?
紧急求助一个LOGISTIC REGRESSION 问题.保险公司technical interview 会怎么问?
R-square of logistic regression谁给说说marketing analysis主要做什么
相关话题的讨论汇总
话题: bagging话题: logistic话题: regression话题: unbalance话题: data