s***c 发帖数: 1926 | 1 刚才查了下,这人用R自带的rxKmeans,在2011年的笔记本上跑123 million的7维数据
,才用6分钟。什么特殊方法都不用。
http://blog.revolutionanalytics.com/2011/06/kmeans-big-data.html
Finally, just for fun, I ran the rxKmeans on the 123 million plus row
airlines data set that I have described in a previous blog post looking for
2 clusters in a 7 dimensional space described by departure time, arrival
time, air time, arrival delay, departure delay, taxi in time and taxi out
time. I have no idea how to interpret the results, but the calculation ra... 阅读全帖 |
|