|
l***a 发帖数: 12410 | 2 have to read it when I get back home. my office computer cannot access this
link :(
btw, is my case called "imbalanced data"? I thought that's about the balance
of treatments, in other words about independent variables. my case is about
the very rare event, which is about the dependent variable. maybe a silly
question |
|
D******n 发帖数: 2836 | 3 yes, it is called imbalanced data, with very scarce amount of 1s and a lot o
f 0s or vise versa.
this
balance
about |
|
R******d 发帖数: 1436 | 4 我balance的目的是,试了不同的ratio,balance的performance最好。
我是故意要很高的specificity的,sensitivity和specificity可以通过设定不同的
prediction score threshold来实现。一般都是报道两者加和最大时候的值吧?AUC这
个指标和数
据是balance和imbalanced的无关。我的AUC在0.88
我的目的是只要找到就行,不需要都找到,所以把specificity调得很高。对于我的处
理方法,算ppv
是应该按balance还是imbalance的来算?
profit lift怎么用?
matrix |
|
d***2 发帖数: 341 | 5 for phase 1 studies, replacements is often done for drop-outs. When a
subject drops out, the next enrolled subject gets the same randomization
numbers...etc...as long as it's well defined in protocol.
in phase II+ trials, an additional portion of subjects is added into the
study based on estimated drop-out rate (such as 15-20%) to maintain the
power. Imbalanced drop-out is another story, you will have to look into the
discontinuation mechanism by using pattern mixture...etc.
|
|
|
c*****l 发帖数: 1493 | 7 做个down sampling, 弄得相对balanced (或者来个two stage classification)
还可以把两种error加权作为misclassification rate |
|
|
Y****a 发帖数: 243 | 9 一般来讲,model自带的evaluation score都是针对整体performance的,即false
negative + false positive,有些model可以针对不同的classification error设置不
同的panalty,不太清楚你用的rf算法里是怎么弄得。不过既然你的主要concern是
false negative,那就不要过分依赖model给出的acc value了。
random forest应该会给出每个observation 对每个class的score,default是把这个
observation分到score高的那个class,你也可以调节这个theshhold,比如,negative
score是0.4,positive score 是0.6,还是把它划为negative,虽然positive的score
更高一些。看看是不是有什么帮助。 |
|
j****k 发帖数: 46 | 10 chapter 16 of this book addressed this issue in particular, hope this helps,
including most of the ideas you mentioned: http://appliedpredictivemodeling.com/
I have a worse problem than yours, 300 "1", while 0.8m "0", hopefully it
works as well :D
most |
|
|
y**3 发帖数: 267 | 12 可以直接用firth penalized 吧?有争论说比OVERSAMPLING好 |
|
A*****n 发帖数: 243 | 13 这种情况下ACC的比较应该没有什么意义,可以看看你4种情况下的ROC曲线,说不定都
很像。 |
|
y**3 发帖数: 267 | 14 请问acc是啥?是AIC, OR AICC吗?
不比较ACC的话,应该比较什么呢?请指教 |
|
y**3 发帖数: 267 | 15 Just figured out. ACC should be accuracy |
|
w*****a 发帖数: 218 | 16 这个是正解
必要情况下,X-轴用 LOG, 或Y-轴也用 LOG |
|
m******r 发帖数: 1033 | 17 我来抛个砖。
看见这个2.5% vs 97.5% 是不是可以imbalanced sampling?
另外,怎么会有这么多feature ? 有的feature一眼看过去就没用 直接garbage
collection. |
|
|
发帖数: 1 | 19 Over sampling under sampling techniques. From the link u provided, this only
applies to cases that sampling is biased from population and u know it
beforehand. Confusion mertrics and classification report may be one tool
with purposely adjusting the class probability and use f score as a measure.
The features are big, probably need do Sth on it first. Feeling need reduce
the dimensions first instead of only shrinking it.
Rookie一个, please feel free to comment .
: Combat Imbalanced Classes
<... 阅读全帖 |
|
t******g 发帖数: 2253 | 20 这个是问怎么处理imbalanced samples,然后如何在这种情况下build model |
|
i********r 发帖数: 1153 | 21 then wouldn't that make your shoving range and 3bet range imbalanced? |
|