请教一个频率优化问题（相关性？） - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 请教一个频率优化问题（相关性？）

相关主题
● [合集] k-mean clustering	● very simple question about Cluster data
● 请教一个R:K-means的问题	● 怎么样estimate two-way cluster logistic regression？有包子
● 这种情况应该用什么hypothesis test。	● 请教一个关于clustering的问题
● 用什么可以画这个clustering 图? R?	● 求教：Cox PH 模型的cluster data处理
● AR(1) and clustering by firms	● 问一个聚类分析
● Clustered Data能用GEE或Mixed Model吗？	● 包子贴请帮忙下一个程序，U Wisconsin的同学请进
● 请问哪里有PCA的SAS code 啊	● 求几本电子书，有酬谢。
● 在线等，请教一个SAS关于cluster命令的输出结果问题	● 关于一个预测问题

相关话题的讨论汇总
话题: anova话题: cluster话题: clustering话题: clusters话题: 相关性

进入Statistics版参与讨论

1

(共1页)

A**H 发帖数: 4797	1 需要做一个流程优化，一个问题是这样的。比如说，某一天会进行操作 A 333 B 515 C 325 D 602 ... Z 585 因为B和D和Z的次数最多，所以考虑把这三个一起操作，剩下的再编成一个组进行操作上面的数字是某一天的，也就是说BDZ也不是每天都是次数最多的，比如另外一天的情况可能是 A 600 B 515 C 325 D 302 ... Z 585 那么ABZ就是最高的了。现在的问题是利用已有的每天的数据来进行统计学上合理的编排每个组有几个项目不定，也就是说把ABDZ编排到一起也是可以的，BDZ也可以，BD也可以，反正只要统计学上能支持怎么着手？我能想到的是先做几个日子之间的frequency correlation，看看每天的相关性支持不支持编组。如果每天之间的相关性很低，那么说明怎么编排都没有意义。如果有相关性，那么就继续下去，但是怎么继续，不知道。请教各位老师和同学，谢谢！
g******2 发帖数: 234	2 1. clustering: cluster the procedures using historic, maybe 2 clusters in your case 2. ANOVA: test whether the two groups are really different. or if you have data for many dates, you could fit a saturated linear model ( each procedure would be one dummy variable) and then run a model selection.
A**H 发帖数: 4797	3 ANOVA我能理解，但是为什么clustering？是不是说clustering，比如into 3 clusters ，就可以说应该编排进入3组？而且是不是把很多日子的数据一起clustering? 我确实有很多日期的数据，而且也有100个项目（原帖中的A~Z）我看看saturated linear model去。能再给个比较适合的连接让我学习学习吗？谢谢了！ ( 【在 g******2 的大作中提到】 : 1. clustering: cluster the procedures using historic, maybe 2 clusters in : your case : 2. ANOVA: test whether the two groups are really different. : or if you have data for many dates, you could fit a saturated linear model ( : each procedure would be one dummy variable) and then run a model selection.
g******2 发帖数: 234	4 you need to know how many groups before you can use ANOVA, so that's why you need to cluster the data. Yes, use many daily data to cluster the procedures. I take back what I said about saturated model, which may not help your case. Just do clustering + anova.
A**H 发帖数: 4797	5 谢谢我根据这里的方法做的clustering http://www.statmethods.net/advstats/cluster.html 用的其中的Partitioning这一节，我得到了一个"Within groups sum of squares" versus "Number of Clusters" plot. 从这个plot里面我选了clusters = 5，然后做下面 fit <- kmeans(mydata, 5) # 5 cluster solution # get cluster means aggregate(mydata,by=list(fit$cluster),FUN=mean) # append cluster assignment mydata <- data.frame(mydata, fit$cluster) 我得到了哪个项目应该归到哪一类里面感觉到这里，似乎就已经做完了。。。。。我知道了哪些项目应该编排到一起然后，我再根据下面这个 http://www.stat.columbia.edu/~martin/W2024/R3.pdf 做anova分析，没有地方需要用到clustering 我得到的 results = aov(数值 ~ 项目, data=mydf) p value很小很小，所以H0肯定不成立接下来，我做pairwise.t.test(数值，项目，bonferroni) 但是这里我也没有用到cluster=5 感觉上我是不是应该把所有的项目根据clustering的结果进行sub-group,也就是分成5 类，然后每一类再做anova and pairwise.t.test。估计这样每一个亚类的anova应该不推翻H0，也就是说每一个亚类里面的每一组的平均值没有显著差异，这样一来，说明我们的clustering是正确的。我这样理解对吗？再次谢谢 you case. 【在 g******2 的大作中提到】 : you need to know how many groups before you can use ANOVA, so that's why you : need to cluster the data. Yes, use many daily data to cluster the : procedures. : I take back what I said about saturated model, which may not help your case. : Just do clustering + anova.
g******2 发帖数: 234	6 use the cluster label to run anova. pairwise t test is not necessary, because you do not know the optimal number of clusters, so some clusters may contain very different procedures.
A**H 发帖数: 4797	7 Thanks. Will try that. may 【在 g******2 的大作中提到】 : use the cluster label to run anova. pairwise t test is not necessary, : because you do not know the optimal number of clusters, so some clusters may : contain very different procedures.

1

(共1页)

进入Statistics版参与讨论

相关主题
● 关于一个预测问题	● AR(1) and clustering by firms
● 有没有 Cluster Analysis 的好书?	● Clustered Data能用GEE或Mixed Model吗？
● 一个sas问题	● 请问哪里有PCA的SAS code 啊
● 请问proc survey procedures	● 在线等，请教一个SAS关于cluster命令的输出结果问题
● [合集] k-mean clustering	● very simple question about Cluster data
● 请教一个R:K-means的问题	● 怎么样estimate two-way cluster logistic regression？有包子
● 这种情况应该用什么hypothesis test。	● 请教一个关于clustering的问题
● 用什么可以画这个clustering 图? R?	● 求教：Cox PH 模型的cluster data处理

相关话题的讨论汇总
话题: anova话题: cluster话题: clustering话题: clusters话题: 相关性

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)