由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - [question] sample estimation of eigenvalues
相关主题
关于Generalized Linear Mixed Models(GLMMs)的问题请问如何用R来求covariance的S-estimator?感谢!
请教一个pca的问题问个logistic regression的问题。
请问如何得到covariance matrix比较smooth的estimate?外行问个基本的统计问题
用PCA的时候,如果P比N大怎么办啊?【欢迎进来讨论】Residual Analysis 的问题
请教多元线性回归的问题请问各位大牛 当regressor/covariates 是random的时候 怎么estimate啊?
急问Estimate group means in mixed model问个关于variance的问题!
SAS包子请教 - use PCA to create an index关于power analysis的应用(healthcare industry)
问个线代问题有谁知道crossover design里面作linear mixed model如何计算coefficient of variation (CV)?
相关话题的讨论汇总
话题: sample话题: estimation话题: lambda话题: covariance
进入Statistics版参与讨论
1 (共1页)
A*******s
发帖数: 3942
1
I saw some conclusions many times like:
sample estimations of larger eigenvalues (of covariance matrix) are biased
high, and those of smaller are biased low.
Could any big bull tell me why? thanks in advance and baozi will be sent
later
a********s
发帖数: 188
2
I am not a big bull, but I think, the covariance matrix ("Sigma") can be
eigendecomposed into
Sigma = FDt(F)
where, D is the diagonal matrix of eigenvalues. The larger the eigenvalue it
is, the higher the variation of covariance matrix.
A*******s
发帖数: 3942
3
sorry i didn't state my question clearly.
say we want to estimate population eigenvetors and eigenvalues based on an
observed sample. we rank our estimated eigenvalues or lambda_hats from largest
to smallest. Then for larger lambda_hats, say lambda_hat_1, it is biased
high, in other words, E[lambda_hat_1] > lambda_1. It is biased low for
smaller lambda_hats, say E[lambda_hat_n] < lambda_n
Why? or it is just my misunderstanding?

it

【在 a********s 的大作中提到】
: I am not a big bull, but I think, the covariance matrix ("Sigma") can be
: eigendecomposed into
: Sigma = FDt(F)
: where, D is the diagonal matrix of eigenvalues. The larger the eigenvalue it
: is, the higher the variation of covariance matrix.

a********s
发帖数: 188
4
I am sorry that I do not know the answer. Looking forward to a big bull ...

largest

【在 A*******s 的大作中提到】
: sorry i didn't state my question clearly.
: say we want to estimate population eigenvetors and eigenvalues based on an
: observed sample. we rank our estimated eigenvalues or lambda_hats from largest
: to smallest. Then for larger lambda_hats, say lambda_hat_1, it is biased
: high, in other words, E[lambda_hat_1] > lambda_1. It is biased low for
: smaller lambda_hats, say E[lambda_hat_n] < lambda_n
: Why? or it is just my misunderstanding?
:
: it

A*******s
发帖数: 3942
5
thanks anyway for helping me clarify the problem :)

.

【在 a********s 的大作中提到】
: I am sorry that I do not know the answer. Looking forward to a big bull ...
:
: largest

l*********s
发帖数: 5409
6
I looked up the original article, it does not mention why but only states it is
an well known observation.

【在 A*******s 的大作中提到】
: thanks anyway for helping me clarify the problem :)
:
: .

I*****a
发帖数: 5425
7
Is this caused by the sparsity of data in high dimensions ?

largest

【在 A*******s 的大作中提到】
: sorry i didn't state my question clearly.
: say we want to estimate population eigenvetors and eigenvalues based on an
: observed sample. we rank our estimated eigenvalues or lambda_hats from largest
: to smallest. Then for larger lambda_hats, say lambda_hat_1, it is biased
: high, in other words, E[lambda_hat_1] > lambda_1. It is biased low for
: smaller lambda_hats, say E[lambda_hat_n] < lambda_n
: Why? or it is just my misunderstanding?
:
: it

A*******s
发帖数: 3942
8
which article? thanks

it is

【在 l*********s 的大作中提到】
: I looked up the original article, it does not mention why but only states it is
: an well known observation.

A*******s
发帖数: 3942
9
more details?

【在 I*****a 的大作中提到】
: Is this caused by the sparsity of data in high dimensions ?
:
: largest

I*****a
发帖数: 5425
10
I dont know. I was just guessing ...
Did the paper you mentioned say anything about sample size, or asymptotic re
sults on this ?

【在 A*******s 的大作中提到】
: more details?
相关主题
急问Estimate group means in mixed model请问如何用R来求covariance的S-estimator?感谢!
SAS包子请教 - use PCA to create an index问个logistic regression的问题。
问个线代问题外行问个基本的统计问题
进入Statistics版参与讨论
A*******s
发帖数: 3942
11
No. i saw it on Freidman's paper "Regularized Discriminant Analysis". Right
after the formula (18) you can see his description of this problem.
"This shrinkage has the effect of decreasing the larger eigenvalues and
increasing the smaller ones, thereby counteracting the biasing inherent in
sample based estimation of eigenvalues."
I recall I saw similar statements before but I don't know where it is from.

re

【在 I*****a 的大作中提到】
: I dont know. I was just guessing ...
: Did the paper you mentioned say anything about sample size, or asymptotic re
: sults on this ?

l*********s
发帖数: 5409
12
regularized discriminant analysis by Friedman.

【在 A*******s 的大作中提到】
: which article? thanks
:
: it is

l*********s
发帖数: 5409
13
I found the statement in a googlescanned book titled something like "
principle component analysis", which referenced the Friedman's paper . :-)

Right
.

【在 A*******s 的大作中提到】
: No. i saw it on Freidman's paper "Regularized Discriminant Analysis". Right
: after the formula (18) you can see his description of this problem.
: "This shrinkage has the effect of decreasing the larger eigenvalues and
: increasing the smaller ones, thereby counteracting the biasing inherent in
: sample based estimation of eigenvalues."
: I recall I saw similar statements before but I don't know where it is from.
:
: re

A*******s
发帖数: 3942
14
殊途同归啊。Friedman老头子不给reference以为是common sense的一句话得让我这种
笨人琢磨半天都不知道为啥。

【在 l*********s 的大作中提到】
: I found the statement in a googlescanned book titled something like "
: principle component analysis", which referenced the Friedman's paper . :-)
:
: Right
: .

s*****9
发帖数: 108
15
这方面的paper还挺多的。说说我的想法:
比方说做PCA, 如果sample size n和参数空间维数p相比不算太大的话。那么估计出来
的头几个PC所代表的variation会比其实际的要大一些吧(co-linearity等等因素),
因此最大的几个eigenvalue就会被over-estimated。总的variation的估计不会偏差多
少,那么最后几个PC所在的variation就会被压缩的很小。最小的那个eigenvalue经常
会被shrink到0,导致covariance matrix不可逆。要是n >> p ,应该不存在这个问题
吧。
A*******s
发帖数: 3942
16
感谢啊,包子随后奉上。明白了为什么overestimate first few eigenvalues-->
underestimate last few eigenvalues. 不过还是没有彻底明白为啥co-linearity->
overestimate first few eigenvalues. 能推荐几篇paper么?

【在 s*****9 的大作中提到】
: 这方面的paper还挺多的。说说我的想法:
: 比方说做PCA, 如果sample size n和参数空间维数p相比不算太大的话。那么估计出来
: 的头几个PC所代表的variation会比其实际的要大一些吧(co-linearity等等因素),
: 因此最大的几个eigenvalue就会被over-estimated。总的variation的估计不会偏差多
: 少,那么最后几个PC所在的variation就会被压缩的很小。最小的那个eigenvalue经常
: 会被shrink到0,导致covariance matrix不可逆。要是n >> p ,应该不存在这个问题
: 吧。

I*****a
发帖数: 5425
17
Yes this is intuitively understandable. I think when the sample size is so
large that the data are not sparse in any dimension, then the estimation is
not problematic.
I just checked Friedman's paper, and he said the biases are most severe espe
cially when the eigenvalues are close to each other. I don't quite get that.
Do you know the reason for this ?

【在 s*****9 的大作中提到】
: 这方面的paper还挺多的。说说我的想法:
: 比方说做PCA, 如果sample size n和参数空间维数p相比不算太大的话。那么估计出来
: 的头几个PC所代表的variation会比其实际的要大一些吧(co-linearity等等因素),
: 因此最大的几个eigenvalue就会被over-estimated。总的variation的估计不会偏差多
: 少,那么最后几个PC所在的variation就会被压缩的很小。最小的那个eigenvalue经常
: 会被shrink到0,导致covariance matrix不可逆。要是n >> p ,应该不存在这个问题
: 吧。

s*****9
发帖数: 108
18
我的猜想:比方说一个矩阵的某个特征值是个多重根,那么对应的那些特征向量就是一
个子空间的基,
它们之间的地位是对等的,所以在样本不够的时候容易identify不出来吧。

so
is
espe
that.

【在 I*****a 的大作中提到】
: Yes this is intuitively understandable. I think when the sample size is so
: large that the data are not sparse in any dimension, then the estimation is
: not problematic.
: I just checked Friedman's paper, and he said the biases are most severe espe
: cially when the eigenvalues are close to each other. I don't quite get that.
: Do you know the reason for this ?

s*****9
发帖数: 108
19
客气了,我google了这个topic,也没找到理论上的公式来说明这个现象。倒是有很多
paper里包括了
相关的simulation的结果。
这个paper:“A bootstrap approach to eigenvalue correction”
的第一页最后和第二页开头提了一些paper,不知道有没有帮助。
我猜可以通过定义eigen-value作为一个优化问题的解,进而写出eigen-value的一个近
似表达式。
应该能得到consistence和rate的一些结果。

【在 A*******s 的大作中提到】
: 感谢啊,包子随后奉上。明白了为什么overestimate first few eigenvalues-->
: underestimate last few eigenvalues. 不过还是没有彻底明白为啥co-linearity->
: overestimate first few eigenvalues. 能推荐几篇paper么?

A*******s
发帖数: 3942
20
thanks. more baozi sent!

【在 s*****9 的大作中提到】
: 客气了,我google了这个topic,也没找到理论上的公式来说明这个现象。倒是有很多
: paper里包括了
: 相关的simulation的结果。
: 这个paper:“A bootstrap approach to eigenvalue correction”
: 的第一页最后和第二页开头提了一些paper,不知道有没有帮助。
: 我猜可以通过定义eigen-value作为一个优化问题的解,进而写出eigen-value的一个近
: 似表达式。
: 应该能得到consistence和rate的一些结果。

1 (共1页)
进入Statistics版参与讨论
相关主题
有谁知道crossover design里面作linear mixed model如何计算coefficient of variation (CV)?请教多元线性回归的问题
Covariance matrix estimate急问Estimate group means in mixed model
问一个关于linear regression的error假设问题SAS包子请教 - use PCA to create an index
请教:estimate, estimation, and estimator问个线代问题
关于Generalized Linear Mixed Models(GLMMs)的问题请问如何用R来求covariance的S-estimator?感谢!
请教一个pca的问题问个logistic regression的问题。
请问如何得到covariance matrix比较smooth的estimate?外行问个基本的统计问题
用PCA的时候,如果P比N大怎么办啊?【欢迎进来讨论】Residual Analysis 的问题
相关话题的讨论汇总
话题: sample话题: estimation话题: lambda话题: covariance