由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Quant版 - data clustering by vector correlation distance (转载)
相关主题
问一道电面题,关于4个random variable相互之间的correlation问个C++
问个C++的问题弱问: 怎样计算5% VaR
如何在c++里存储lattice一个expectation问题请教
Predict values of vectors generated by black box functions有人觉得matlab运算很慢么?
two vectors' coefficient of determination (转载)Math Interview Question Help
搞金融的一般都用MAC还是PC?C++ and threading interview questions
a small c++ question也问一个题--C++
问一个期望值的问题C++里面如何最方便的表示这个数组的数组?
相关话题的讨论汇总
话题: vector话题: subgroup话题: data话题: gt话题: gi
进入Quant版参与讨论
1 (共1页)
l******9
发帖数: 579
1
【 以下文字转载自 Statistics 讨论区 】
发信人: light009 (light009), 信区: Statistics
标 题: data clustering by vector correlation distance
发信站: BBS 未名空间站 (Wed Feb 26 11:17:21 2014, 美东)
I am working on data analysis.
Given a group of data vectors, each of them has the same dimension. Each
element in a vector is a floating point number.
V1 [ , , , … ]
V2[ , , , … ]
...
Vn [ , , , … ]
Suppose that each vector has M numbers. M can be 10000.
n can be 200.
I need to find out how to partition the n vectors into sub-groups such that
each vector in one subgroup can be represented by a basic vector in the
subgroup.
For example,
W = union of V1, V2, V3 … Vn
Find subgroup i, j, … t :
Gi = [ V1, V6, V3, V5, … , Vx ]
Gj = [V22, V11, V56, V45, … , Vy]

Gt = [V78, V90, V9, V12, … , Vz]
Such that :
Union of Gi , Gj, … , Gt is equal to W and there is no overlap among all Gi
, Gj, … , Gt.
Also , each subgroup has a basic vector that has strong correlation with all
other element vector in the subgroup. For example, in Gi, we may have
vector Vx as the basic vector such that all other vectors have strong (
linear) correlation with Vx.
Moreover, we need to minimize the number of the subgroups, here, it is " t "
. It means that given 200 vectors ( n = 200), we prefer a subgroup G1, G2,
…, Gt, and t is minimized. For example, we prefer t = 5 over t = 6. if t is
more than 10, it may not be useful.
My questions: What kind of knowledge domain this problem belongs to ?
Is it a clustering analysis ? But, in cluster analysis, one data point is a
number, but, here one data point is a vector.
Are there some statistics models or algorithm can be used to do this kind of
analysis ? Are there some software tools or packages that solve this
problem ?
If my questions are not a good fit for this forum, please tell me where I
should post it.
R packages do the clustering for data points not for data vector by
correlation.
Any help would be appreciated.
A**********i
发帖数: 780
2
google community detection. Use igraph package in R.
1 (共1页)
进入Quant版参与讨论
相关主题
C++里面如何最方便的表示这个数组的数组?two vectors' coefficient of determination (转载)
再问在excel中调用c++ function搞金融的一般都用MAC还是PC?
one probability questiona small c++ question
memory allocation about vector问一个期望值的问题
问一道电面题,关于4个random variable相互之间的correlation问个C++
问个C++的问题弱问: 怎样计算5% VaR
如何在c++里存储lattice一个expectation问题请教
Predict values of vectors generated by black box functions有人觉得matlab运算很慢么?
相关话题的讨论汇总
话题: vector话题: subgroup话题: data话题: gt话题: gi