A*******s 发帖数: 3942 | 1 I wrote a tree macro which needs to split a dataset many times based on
different conditions, but it runs quite slowly. For a 700 rows dataset and
500 conditions, it takes 3~5 minutes to complete a loop... Is there any
general way to improve the efficiency? I can only come up two ways:
1. Create index on the variables in if conditional statement?
2. Multithread/parallel programming? I just read oloolo's blog about this
part, wish I could figure out how to do that.
Any other ideas? |
D******n 发帖数: 2836 | 2 dont know how u dissect your dataset, so dont know how to improve it on that
side. what is tree macro? btw the kid is cute... |
d*******o 发帖数: 493 | 3 700 rows 这么小的data set跑那么慢,你是不是loop太多了?
要是所有输入输出的data set 不超过1g的话,可以把你要用的library放到内存里,省
一半以上的时间。
libname mylib “c:\temp” memlib; |
A*******s 发帖数: 3942 | 4 thanks! Like father like son... HAHAHA!!!
Tree is about CART decision tree. It splits the parent node into two child
nodes, based on the information gain.
that
【在 D******n 的大作中提到】 : dont know how u dissect your dataset, so dont know how to improve it on that : side. what is tree macro? btw the kid is cute...
|
A*******s 发帖数: 3942 | 5 你说的是好办法,回去试试。谢谢!
我觉得有可能是I/O的问题。500个loop意味着要对parent node分割500次,生成1000个
child nodes。我在想可以加flag而不是做physical split.
【在 d*******o 的大作中提到】 : 700 rows 这么小的data set跑那么慢,你是不是loop太多了? : 要是所有输入输出的data set 不超过1g的话,可以把你要用的library放到内存里,省 : 一半以上的时间。 : libname mylib “c:\temp” memlib;
|
s*r 发帖数: 2757 | 6 check the logic dependence among conditions |
A*******s 发帖数: 3942 | 7 Any reference? Many thanks...
【在 s*r 的大作中提到】 : check the logic dependence among conditions
|