由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - An interview question
相关主题
[SAS]一个比较大的dataset中求特定对variable的R2求助帮忙找statistical process control方面的dataset, 包子答谢。
Fraud detection model 在testing dataset 中效果很差,求原因请教可用以分析“多因素、少数据点”的方法?
logistic regression结果释疑,解读SAS 问题:关于比较variable 包子答谢
【大包子】Factor data analysis问个PROC SQL中INNER JOIN的问题
多大的data算是large data set?[SAS] how to do nested loop between 2 datasets?
##面试过了,请教问题##SAS CODE求助
regression的时候什么时候要standardize variables?SAS Code 求助,如何把在另一个dataset的id找出来
有没有大牛来classifiy一下 PCA用法吗? (转载)问一个数据bias的问题
相关话题的讨论汇总
话题: dataset话题: variables话题: question话题: 100
进入Statistics版参与讨论
1 (共1页)
c******t
发帖数: 8
1
You have dataset 1, with 100 explanatory variables, response and 100
observations. You want to construct a model for prediction, but 100
variables is too many. Penalized methods, such as LASSO can be used.
The question is: you have dataset 2, with same 100 explanatory variables,
but the sample size is 1000, no response, and the data come from similar
population. How to use dataset 2 to help you construct a prediction model
for dataset 1?
Thanks
B******5
发帖数: 4676
2
semi-supervised learning?
R*******c
发帖数: 249
3
他家不会还在招吧。。。

【在 c******t 的大作中提到】
: You have dataset 1, with 100 explanatory variables, response and 100
: observations. You want to construct a model for prediction, but 100
: variables is too many. Penalized methods, such as LASSO can be used.
: The question is: you have dataset 2, with same 100 explanatory variables,
: but the sample size is 1000, no response, and the data come from similar
: population. How to use dataset 2 to help you construct a prediction model
: for dataset 1?
: Thanks

G*******s
发帖数: 10605
4
Principal Component?

【在 c******t 的大作中提到】
: You have dataset 1, with 100 explanatory variables, response and 100
: observations. You want to construct a model for prediction, but 100
: variables is too many. Penalized methods, such as LASSO can be used.
: The question is: you have dataset 2, with same 100 explanatory variables,
: but the sample size is 1000, no response, and the data come from similar
: population. How to use dataset 2 to help you construct a prediction model
: for dataset 1?
: Thanks

L*****k
发帖数: 327
5
this is a transfer learning problem, hehe~~many ways to do it
en, the most straightforward way is, do unsupervised learning for data1+
data2(explanatory data only) together, like dimension reduction

【在 c******t 的大作中提到】
: You have dataset 1, with 100 explanatory variables, response and 100
: observations. You want to construct a model for prediction, but 100
: variables is too many. Penalized methods, such as LASSO can be used.
: The question is: you have dataset 2, with same 100 explanatory variables,
: but the sample size is 1000, no response, and the data come from similar
: population. How to use dataset 2 to help you construct a prediction model
: for dataset 1?
: Thanks

l******0
发帖数: 73
6
我面试时也问了这个问题,请问这是哪家的经典?

【在 R*******c 的大作中提到】
: 他家不会还在招吧。。。
o****o
发帖数: 8077
7
does it mean that use the information in dataset2 to prevent overfitting in
dataset1?

【在 L*****k 的大作中提到】
: this is a transfer learning problem, hehe~~many ways to do it
: en, the most straightforward way is, do unsupervised learning for data1+
: data2(explanatory data only) together, like dimension reduction

s******0
发帖数: 1269
8
could you give more details about how to answer?
D******n
发帖数: 2836
9
it bears so many names.
Isnt it just like the reject-inference problem in risk modeling?

【在 B******5 的大作中提到】
: semi-supervised learning?
L*****k
发帖数: 327
10
in the conceptual level, if you have data from dataset2, and there is a link
between dataset 1 & 2, then the knowledge you learn from 2 should be
helpful for tasks in dataset 1

in

【在 o****o 的大作中提到】
: does it mean that use the information in dataset2 to prevent overfitting in
: dataset1?

L*****k
发帖数: 327
11
this is not a typical semi-supervised learning problem, which assumes the
explanatory data have the same distribution

【在 D******n 的大作中提到】
: it bears so many names.
: Isnt it just like the reject-inference problem in risk modeling?

w********m
发帖数: 1137
12
Situation: n = p -> high dimension/low power
Action: PCA -> dimension reduction
Result: avoid overfitting and heteroskedasticity.
S*x
发帖数: 705
13
PCA
or
Proc varclus
on the 2nd dateset
Use the selected (new) variables from 2nd dataset, build model on first
dataset

【在 c******t 的大作中提到】
: You have dataset 1, with 100 explanatory variables, response and 100
: observations. You want to construct a model for prediction, but 100
: variables is too many. Penalized methods, such as LASSO can be used.
: The question is: you have dataset 2, with same 100 explanatory variables,
: but the sample size is 1000, no response, and the data come from similar
: population. How to use dataset 2 to help you construct a prediction model
: for dataset 1?
: Thanks

A*******s
发帖数: 3942
14
need to ask more details to see what is of interest behind that question. my
speculation is that since it is a high dimensional problem, interviewers
may expect you say something like PCA. And the task is a typical semi-
supervised learning, so maybe "first clustering, then EM" methods are
somethings they are looking for.
1 (共1页)
进入Statistics版参与讨论
相关主题
问一个数据bias的问题多大的data算是large data set?
问两个个KNN的问题##面试过了,请教问题##
请教一个关于PCA的问题regression的时候什么时候要standardize variables?
抓狂!为啥选出来的predictor都这么差有没有大牛来classifiy一下 PCA用法吗? (转载)
[SAS]一个比较大的dataset中求特定对variable的R2求助帮忙找statistical process control方面的dataset, 包子答谢。
Fraud detection model 在testing dataset 中效果很差,求原因请教可用以分析“多因素、少数据点”的方法?
logistic regression结果释疑,解读SAS 问题:关于比较variable 包子答谢
【大包子】Factor data analysis问个PROC SQL中INNER JOIN的问题
相关话题的讨论汇总
话题: dataset话题: variables话题: question话题: 100