p****e 发帖数: 165 | 1 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工
作中应用太广了,各个business domain都适用,比如以下领域:
Sales: how can we find which factors most affect product sales?
Inventory: how can we find which factors impact product availability?
Engineering: how can identify root causes behind manufacturing defects?
Human resources: how can we identify what causes high performers to leave?
总体来说就是有一堆可能的变量都可能作用于一个target变量,最后要找出几个最重要
的变量来做reporting或者建model. 有以下几种方法候选,大家说说哪些在实际中比较
常用?以及用什么工具实现?
1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation
大于0.5之类的;
2. regression model, 看p value
3. decision tree, 这个不知道用什么tool可以实现,这个用R貌似实现不了吧?
4. random forrest, neural network 等等,这个用R可以实现么?
请大牛赐教!
谢谢! |
j******4 发帖数: 6090 | |
p****e 发帖数: 165 | 3 有没有知道decision tree如何在实际工作中运用?是这样的business use case么?
correlation
【在 p****e 的大作中提到】 : 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工 : 作中应用太广了,各个business domain都适用,比如以下领域: : Sales: how can we find which factors most affect product sales? : Inventory: how can we find which factors impact product availability? : Engineering: how can identify root causes behind manufacturing defects? : Human resources: how can we identify what causes high performers to leave? : 总体来说就是有一堆可能的变量都可能作用于一个target变量,最后要找出几个最重要 : 的变量来做reporting或者建model. 有以下几种方法候选,大家说说哪些在实际中比较 : 常用?以及用什么工具实现? : 1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation
|
k*******a 发帖数: 772 | 4 random forest你可以算variable importance |
v******y 发帖数: 4134 | |
s*****n 发帖数: 169 | 6 information gain.
correlation
【在 p****e 的大作中提到】 : 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工 : 作中应用太广了,各个business domain都适用,比如以下领域: : Sales: how can we find which factors most affect product sales? : Inventory: how can we find which factors impact product availability? : Engineering: how can identify root causes behind manufacturing defects? : Human resources: how can we identify what causes high performers to leave? : 总体来说就是有一堆可能的变量都可能作用于一个target变量,最后要找出几个最重要 : 的变量来做reporting或者建model. 有以下几种方法候选,大家说说哪些在实际中比较 : 常用?以及用什么工具实现? : 1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation
|
c********g 发帖数: 449 | 7 好多啊 in data mining.
如:fuzzy ranking etc |
q***m 发帖数: 9 | 8 Model-independent的有 ROC 之类的,
Model-dependent的有 Random forest, SVM 中的 importance,
correlation
【在 p****e 的大作中提到】 : 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工 : 作中应用太广了,各个business domain都适用,比如以下领域: : Sales: how can we find which factors most affect product sales? : Inventory: how can we find which factors impact product availability? : Engineering: how can identify root causes behind manufacturing defects? : Human resources: how can we identify what causes high performers to leave? : 总体来说就是有一堆可能的变量都可能作用于一个target变量,最后要找出几个最重要 : 的变量来做reporting或者建model. 有以下几种方法候选,大家说说哪些在实际中比较 : 常用?以及用什么工具实现? : 1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation
|
s***e 发帖数: 5242 | 9 PCA?
correlation
【在 p****e 的大作中提到】 : 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工 : 作中应用太广了,各个business domain都适用,比如以下领域: : Sales: how can we find which factors most affect product sales? : Inventory: how can we find which factors impact product availability? : Engineering: how can identify root causes behind manufacturing defects? : Human resources: how can we identify what causes high performers to leave? : 总体来说就是有一堆可能的变量都可能作用于一个target变量,最后要找出几个最重要 : 的变量来做reporting或者建model. 有以下几种方法候选,大家说说哪些在实际中比较 : 常用?以及用什么工具实现? : 1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation
|
m****9 发帖数: 492 | |
p*******i 发帖数: 1181 | 11 random forest是个不错的实现方法,R有个package好像就是RF作者亲自写的 |