l******9 发帖数: 579 | 1 【 以下文字转载自 Statistics 讨论区 】
发信人: light009 (light009), 信区: Statistics
标 题: data clustering by vector correlation distance
发信站: BBS 未名空间站 (Wed Feb 26 11:17:21 2014, 美东)
I am working on data analysis.
Given a group of data vectors, each of them has the same dimension. Each
element in a vector is a floating point number.
V1 [ , , , … ]
V2[ , , , … ]
...
Vn [ , , , … ]
Suppose that each vector has M numbers. M can be 10000.
n can be 200.
I need to find out how to partition the n vectors into sub-groups such that
each vector in one subgroup can be represented by a basic vector in the
subgroup.
For example,
W = union of V1, V2, V3 … Vn
Find subgroup i, j, … t :
Gi = [ V1, V6, V3, V5, … , Vx ]
Gj = [V22, V11, V56, V45, … , Vy]
…
Gt = [V78, V90, V9, V12, … , Vz]
Such that :
Union of Gi , Gj, … , Gt is equal to W and there is no overlap among all Gi
, Gj, … , Gt.
Also , each subgroup has a basic vector that has strong correlation with all
other element vector in the subgroup. For example, in Gi, we may have
vector Vx as the basic vector such that all other vectors have strong (
linear) correlation with Vx.
Moreover, we need to minimize the number of the subgroups, here, it is " t "
. It means that given 200 vectors ( n = 200), we prefer a subgroup G1, G2,
…, Gt, and t is minimized. For example, we prefer t = 5 over t = 6. if t is
more than 10, it may not be useful.
My questions: What kind of knowledge domain this problem belongs to ?
Is it a clustering analysis ? But, in cluster analysis, one data point is a
number, but, here one data point is a vector.
Are there some statistics models or algorithm can be used to do this kind of
analysis ? Are there some software tools or packages that solve this
problem ?
If my questions are not a good fit for this forum, please tell me where I
should post it.
R packages do the clustering for data points not for data vector by
correlation.
Any help would be appreciated. | A**********i 发帖数: 780 | 2 google community detection. Use igraph package in R. |
|