S******y 发帖数: 1123 | 1 Have anybody used Decision Tree in Python or C++? (or written their own
decision tree implementation in Python or C++)? My goal is to run decision
tree on 8 million obs as training set and score 7 million in test set.
I am testing 'rpart' package on a 64-bit-Linux + 64-bit-R environment. But
it seems that rpart is either not stable or running out of memory very
quickly. (Is it because R is passing everything as copy instead of as object
reference?)
(PS. I would love to use SAS EM. but no licen |
l*********s 发帖数: 5409 | 2 R is notorious for bad memory management;
Python is much better, C++ coding is too much a headache. |
S******y 发帖数: 1123 | 3 Thanks. I won't mind re-writing it in C++ if it is 10X faster and better-
resource-managed than R. |
A*******s 发帖数: 3942 | 4 i know there is a SAS macro for CHAID. Not sure if it can handle large
dataset.
decision
object
【在 S******y 的大作中提到】 : Have anybody used Decision Tree in Python or C++? (or written their own : decision tree implementation in Python or C++)? My goal is to run decision : tree on 8 million obs as training set and score 7 million in test set. : I am testing 'rpart' package on a 64-bit-Linux + 64-bit-R environment. But : it seems that rpart is either not stable or running out of memory very : quickly. (Is it because R is passing everything as copy instead of as object : reference?) : (PS. I would love to use SAS EM. but no licen
|
d*******o 发帖数: 493 | 5 SAS is the good choice for large scale data. A 64bit SAS has excellent
memory management and basically can handle any size of datasets.
SAS 9.2 licenses PROC ARBORETUM, which is the foundation of SAS EM Decision
Tree Node. It supports AID, CHAID, XAID and CRT and code generation. It may
be used to train data and produce rules. |