a****g 发帖数: 8131 | 1 有一个survey data,想做一些predictive models
其中一个重要的feature,比例相差很大
比如1000个case里面,A有950个,B只有50个,这种情况下,我应该怎么做sampling?
我记得以前在哪里看到,可以用bootstrap?
resample B and A with repeats so that in the training data I have about
the same proportions of A and B?
any ideas are helpful. Thanks. |
A****n 发帖数: 241 | 2 maybe you can try oversampling |
A****n 发帖数: 241 | |
a****g 发帖数: 8131 | 4 thanks a lot
【在 A****n 的大作中提到】 : 叫 SMOTE
|
g****e 发帖数: 1829 | 5 bagging is an option. You can simply use over sampling
【在 a****g 的大作中提到】 : 有一个survey data,想做一些predictive models : 其中一个重要的feature,比例相差很大 : 比如1000个case里面,A有950个,B只有50个,这种情况下,我应该怎么做sampling? : 我记得以前在哪里看到,可以用bootstrap? : resample B and A with repeats so that in the training data I have about : the same proportions of A and B? : any ideas are helpful. Thanks.
|