l******0 发帖数: 2 | 1 Say I have a relative big dataset which has a categorical variable with many
possible values/levels, for example, country.
If I do one-hot encoding as suggested by scikit learn, I get the error of "
out of memory". But when I load the data into R and treat the variable as a
normal factor and call some R machine learning library or H2o, everything
works fine, at least no error message and the results are acceptable. So I'm
wondering how does R or H2o treat it differently and what's the correct way
to handle this kind of problem. | W***o 发帖数: 6519 | 2 是不是可以这样:
1 united states
0 other
这样循环所有的country, 当前的 country 用1表示
这样每次循环只有两个国家 | m******r 发帖数: 1033 | |
|