谁给说说到底如何在一堆变量中找到the biggest contributing factor? - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 谁给说说到底如何在一堆变量中找到the biggest contributing factor?

相关主题
● PCA拟合问题	● machine learning救助模型在1数据集上表现好其他烂
● 在线求助 eliminated highly correlated variables.	● 问两个一直含糊不清的marketing analysis 的问题
● 如何在应用model前把correlated的predictors去掉？	● 银行还是生统？contractor还是permanent？
● 急需帮助，关于比较ROC的问题。	● data science 面试求教
● good classification methods for high dimension data	● 研究生统计毕业，求内推机会！万分感谢！
● Principal Components Analysis 中 factor 选择的问题	● 也谈什么是统计
● 面试：nonlinear regression, predictive modeling, machining learning问什么？	● model里有multicollinearity，该如何处理呢？
● 找工作总结 [下]	● 向大家请教一个生物统计 RESEARCH的问题

相关话题的讨论汇总
话题: 变量话题: model话题: biggest

进入Statistics版参与讨论

1

(共1页)

p****e 发帖数: 165	1 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工作中应用太广了，各个business domain都适用，比如以下领域： Sales: how can we find which factors most affect product sales? Inventory: how can we find which factors impact product availability? Engineering: how can identify root causes behind manufacturing defects? Human resources: how can we identify what causes high performers to leave? 总体来说就是有一堆可能的变量都可能作用于一个target变量，最后要找出几个最重要的变量来做reporting或者建model. 有以下几种方法候选，大家说说哪些在实际中比较常用？以及用什么工具实现？ 1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation 大于0.5之类的； 2. regression model, 看p value 3. decision tree, 这个不知道用什么tool可以实现，这个用R貌似实现不了吧？ 4. random forrest, neural network 等等，这个用R可以实现么？请大牛赐教！谢谢！
j******4 发帖数: 6090	2 Partial Sum of Squares?
p****e 发帖数: 165	3 有没有知道decision tree如何在实际工作中运用？是这样的business use case么？ correlation 【在 p****e 的大作中提到】 : 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工 : 作中应用太广了，各个business domain都适用，比如以下领域： : Sales: how can we find which factors most affect product sales? : Inventory: how can we find which factors impact product availability? : Engineering: how can identify root causes behind manufacturing defects? : Human resources: how can we identify what causes high performers to leave? : 总体来说就是有一堆可能的变量都可能作用于一个target变量，最后要找出几个最重要 : 的变量来做reporting或者建model. 有以下几种方法候选，大家说说哪些在实际中比较 : 常用？以及用什么工具实现？ : 1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation
k*******a 发帖数: 772	4 random forest你可以算variable importance
v******y 发帖数: 4134	5 这个要看专家意见
s*****n 发帖数: 169	6 information gain. correlation 【在 p****e 的大作中提到】 : 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工 : 作中应用太广了，各个business domain都适用，比如以下领域： : Sales: how can we find which factors most affect product sales? : Inventory: how can we find which factors impact product availability? : Engineering: how can identify root causes behind manufacturing defects? : Human resources: how can we identify what causes high performers to leave? : 总体来说就是有一堆可能的变量都可能作用于一个target变量，最后要找出几个最重要 : 的变量来做reporting或者建model. 有以下几种方法候选，大家说说哪些在实际中比较 : 常用？以及用什么工具实现？ : 1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation
c********g 发帖数: 449	7 好多啊 in data mining. 如:fuzzy ranking etc
q***m 发帖数: 9	8 Model-independent的有 ROC 之类的， Model-dependent的有 Random forest, SVM 中的 importance, correlation 【在 p****e 的大作中提到】 : 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工 : 作中应用太广了，各个business domain都适用，比如以下领域： : Sales: how can we find which factors most affect product sales? : Inventory: how can we find which factors impact product availability? : Engineering: how can identify root causes behind manufacturing defects? : Human resources: how can we identify what causes high performers to leave? : 总体来说就是有一堆可能的变量都可能作用于一个target变量，最后要找出几个最重要 : 的变量来做reporting或者建model. 有以下几种方法候选，大家说说哪些在实际中比较 : 常用？以及用什么工具实现？ : 1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation
s***e 发帖数: 5242	9 PCA？ correlation 【在 p****e 的大作中提到】 : 谁给说说到底如何在一堆变量中找到the biggest contributing factor? 这在实际工 : 作中应用太广了，各个business domain都适用，比如以下领域： : Sales: how can we find which factors most affect product sales? : Inventory: how can we find which factors impact product availability? : Engineering: how can identify root causes behind manufacturing defects? : Human resources: how can we identify what causes high performers to leave? : 总体来说就是有一堆可能的变量都可能作用于一个target变量，最后要找出几个最重要 : 的变量来做reporting或者建model. 有以下几种方法候选，大家说说哪些在实际中比较 : 常用？以及用什么工具实现？ : 1. correlation matrix, 一个个变量对target建correlation, 然后选出correlation
m****9 发帖数: 492	10 mark
p*******i 发帖数: 1181	11 random forest是个不错的实现方法，R有个package好像就是RF作者亲自写的

1

(共1页)

进入Statistics版参与讨论

相关主题
● 向大家请教一个生物统计 RESEARCH的问题	● good classification methods for high dimension data
● ROCR package in R - how to set cutting point?	● Principal Components Analysis 中 factor 选择的问题
● 数据分开的问题请教	● 面试：nonlinear regression, predictive modeling, machining learning问什么？
● Does multivariable logistic regression allow correlated independent variables?	● 找工作总结 [下]
● PCA拟合问题	● machine learning救助模型在1数据集上表现好其他烂
● 在线求助 eliminated highly correlated variables.	● 问两个一直含糊不清的marketing analysis 的问题
● 如何在应用model前把correlated的predictors去掉？	● 银行还是生统？contractor还是permanent？
● 急需帮助，关于比较ROC的问题。	● data science 面试求教

相关话题的讨论汇总
话题: 变量话题: model话题: biggest

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)