由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Mathematics版 - data clustering by vector correlation distance (转载)
相关主题
问一个correlation matrices估计的问题。 (转载)vector space intersection question
Predict values of vectors generated by black box functions问一个orthogonal transformation 的问题
two vectors' coefficient of determination (转载)finite field matrix problem
请教一个向量几何问题derivative of matrix w.r.t. vector??
how to express this probability in matriquestion about Principal Component Analysis
不等式一题Is this true in Matlab: det(Z*Z') == sum(Z.^2)
[合集] 问一个挺难的数学问题,看看有没有人会啊?a question
probability问个ODE的问题
相关话题的讨论汇总
话题: vector话题: subgroup话题: data话题: gt话题: gi
进入Mathematics版参与讨论
1 (共1页)
l******9
发帖数: 579
1
【 以下文字转载自 Statistics 讨论区 】
发信人: light009 (light009), 信区: Statistics
标 题: data clustering by vector correlation distance
发信站: BBS 未名空间站 (Wed Feb 26 11:17:21 2014, 美东)
I am working on data analysis.
Given a group of data vectors, each of them has the same dimension. Each
element in a vector is a floating point number.
V1 [ , , , … ]
V2[ , , , … ]
...
Vn [ , , , … ]
Suppose that each vector has M numbers. M can be 10000.
n can be 200.
I need to find out how to partition the n vectors into sub-groups such that
each vector in one subgroup can be represented by a basic vector in the
subgroup.
For example,
W = union of V1, V2, V3 … Vn
Find subgroup i, j, … t :
Gi = [ V1, V6, V3, V5, … , Vx ]
Gj = [V22, V11, V56, V45, … , Vy]

Gt = [V78, V90, V9, V12, … , Vz]
Such that :
Union of Gi , Gj, … , Gt is equal to W and there is no overlap among all Gi
, Gj, … , Gt.
Also , each subgroup has a basic vector that has strong correlation with all
other element vector in the subgroup. For example, in Gi, we may have
vector Vx as the basic vector such that all other vectors have strong (
linear) correlation with Vx.
Moreover, we need to minimize the number of the subgroups, here, it is " t "
. It means that given 200 vectors ( n = 200), we prefer a subgroup G1, G2,
…, Gt, and t is minimized. For example, we prefer t = 5 over t = 6. if t is
more than 10, it may not be useful.
My questions: What kind of knowledge domain this problem belongs to ?
Is it a clustering analysis ? But, in cluster analysis, one data point is a
number, but, here one data point is a vector.
Are there some statistics models or algorithm can be used to do this kind of
analysis ? Are there some software tools or packages that solve this
problem ?
If my questions are not a good fit for this forum, please tell me where I
should post it.
R packages do the clustering for data points not for data vector by
correlation.
Any help would be appreciated.
1 (共1页)
进入Mathematics版参与讨论
相关主题
问个ODE的问题how to express this probability in matri
uniform convergence?不等式一题
问一个求jacobian的问题[合集] 问一个挺难的数学问题,看看有没有人会啊?
求帮忙下载paperprobability
问一个correlation matrices估计的问题。 (转载)vector space intersection question
Predict values of vectors generated by black box functions问一个orthogonal transformation 的问题
two vectors' coefficient of determination (转载)finite field matrix problem
请教一个向量几何问题derivative of matrix w.r.t. vector??
相关话题的讨论汇总
话题: vector话题: subgroup话题: data话题: gt话题: gi