s*****r 发帖数: 790 | 1 你的什么分段回归正好可以用上呀。
发信人: shinder (suibian+shinder), 信区: Statistics
标 题: Re: 陈大师, 我很好奇
发信站: BBS 未名空间站 (Wed May 18 09:59:15 2011, 美东)
put it simple, using your example before, you can't even be sure whether you
should include the covariate x in your model. you think you should and get
a significant estimate, what if it is just by chance?
Let me give you an example and it is perfect for your theory:
you are a NIU professor in a university and you are teaching a class of 200
students who graduated from 2 high schools, approximately same number of
male and female, and from each school.
Now it is final time and you give the exam. you have one good TA to grade
the exams for you. the TA is supposed to finish grading in two days and you
can let half the students get A and half get B. Your department only needs
the cut-off grade by the third day. You TA finishes grading 100 exams on
the first day but becomes very sick on the second day and couldn't work for
three days. Now you have 100 scores but you need to give a cut-off score to
your department soon.
Further on, there are 100 other similar classes awaiting for your cut-off
score, but you can give it in one week,i.e, you can see all your 200 scores.
what do you do? Hope this problem can help you understand overfitting and
cross-validation.
Here are some information you found:
1) there 100 exams are randomly selected
2) approximately 55 boys and 45 girls.
3) boys SEEM to have scores a little lower than girls
4) students from high school 1 SEEM better than school 2
5) boys from high school 1 SEEM a little better than girls from school 2
6) only 30 students graduated from high school 1
7) during the whole semester you do not feel there are BIG difference
between boys and girls and the two high schools. |
|