由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
DataSciences版 - 回馈本版~ 最近面的面经和收集来的面经~
相关主题
现在的大数据技术的价值和功用有些被夸大了求Google 的 Data Science 有关的位置内推 (转载)
Data scientist / Machine Learning Engineer 相关面试题 (转载)求handle missing data的好方法
刚入行新人的两个问题有人考虑过kaggle上这个预测CTR的题目么?
请问哪些算法是可以用python写的,然后输入PMML我想写个survey报告 关于KNN classification algorithms
我觉得neural network应用范围不大啊spark 问题
hiring: Econometrician/Data Scientist有关归类
求教! how to run python programs on a hadoop cluster怎样能才能快速的找到KNN
Data Scientist的编程能力【真心请教】选master project课题 - 有包子 (转载)
相关话题的讨论汇总
话题: click话题: ads话题: rate话题: regression话题: assume
进入DataSciences版参与讨论
1 (共1页)
m******x
发帖数: 35
1
有些是lz自己面的有些是各处收集来的 红/绿皮书的题就不贴了 可能有些时间的原因
难免可能记错一些 请大家
包含!~
待lz想起来会不定期更新
---
explain EM algorithm, use EM algorithm to find SVD of a given matrix
---
Assume if you write an online training model, estimate how many obs the
parameters start to converge
Whats the convergence rate of online training algorithm (e.g. stochastic
gradient descent)
Convergence rate of gradient descent?
How many points are needed to converge give dim of feature space ?
Derive gradient descent formula
---
reservoir sampling prob. Proof
---
How to select SVM kernel?
---
use monte carlo method to estimate pi, how can you ensure first 6 digits are
accurate?
---
given many fair coins, how to construct an event with p = pi - 3
---
The elevator problem: assume N person in an elevator, there are m floors,
find E(# of stops of this elevator) and Var(# of stops of this elevator)
---
Shuffle an array in o(n) time o(1) space
---
Prove that in regression R^2=cor^2(y,y^hat)
---
Deepcopy a graph
---
When to use linked list, when to use array?
---
Implement a hashmap so that I can iterate this hashmap the order that I put
elements in
---
Implement hashmap using tree
---
deep iterator:
{{1,2,3},4,{{5,6},4}}
find lowest common ancestor given 2 tree nodes
---
define metrics to measure the successfulness of a newsfeed picking algorithm
in facebook
click through rate (if we maximize this rate, will we do some damage?)
---
dialog sql question, assume table:
userid
appid
type: 'imp'/'click'
timestamp
define a metric to measure the successfulness of click over imp
intensity (given a certian time interval, count the click) volume
click through rate (write sql to calculate this rate)
if for an appid # row of click > # of rows of imp, what could be the reason?
some action will generate false click rows
how to calculate the real click through rate based on this erroneous data?
---
write the sqrt(x) function
---
given an array of integers, find the median, faster than sort
---
assume a graph stored as
src|friend
a|b
b|c
..
need to find the friends of friends that are not currently frind of mine
i.e. c to a
how to do this in hadoop platform?
---
regression spline
ridge regression, the shrinking of parameter is proportional to all
paremeters or just individual parameters?
prove this
---
if i have more data points, how the bias and variance will change?
ridge regression and lasso how the bias and varaince will change compared to
linear regression?
---
CART, the splits why its all binary? why we dont use multiple splits for
each split?
what is the stop splitting rule?
how to prune tree?
---
assume we have groups and CM data, how to suggest groups to CM?
how to pick a good metric of distance if we use kNN?
if for each group, build a classification model to estimate the prob that
this CM is interested in, what is the potential pitfall of this?
groups with too few members?
what is the distribution of groups over # of group members?
---
when do you know your model is done?
---
assume we have 3 data sets: 1. user_id,ads_id,click_or_not, 2.user_id, user
attibutes, 3. ads_id, ads attributes
how do you estimate P{click|user, ads}?
if we have users that click a lot of ads, and users only click small amount
of ads, how do you build models that can deal with both kind of users?
(not under sample or cost sensitive modling, i.e. TFIDF)
x***4
发帖数: 1815
2
谢谢!请问红/绿皮书是什么?

【在 m******x 的大作中提到】
: 有些是lz自己面的有些是各处收集来的 红/绿皮书的题就不贴了 可能有些时间的原因
: 难免可能记错一些 请大家
: 包含!~
: 待lz想起来会不定期更新
: ---
: explain EM algorithm, use EM algorithm to find SVD of a given matrix
: ---
: Assume if you write an online training model, estimate how many obs the
: parameters start to converge
: Whats the convergence rate of online training algorithm (e.g. stochastic

t******5
发帖数: 47
3
请问楼主主要面的什么职位?
m******x
发帖数: 35
4
data scientist

【在 t******5 的大作中提到】
: 请问楼主主要面的什么职位?
1 (共1页)
进入DataSciences版参与讨论
相关主题
【真心请教】选master project课题 - 有包子 (转载)我觉得neural network应用范围不大啊
说说最近的一次面试,兼告诫国人hiring: Econometrician/Data Scientist
Neural Network面试的时候会怎么问啊?求教! how to run python programs on a hadoop cluster
问一下python 或者是 R 里面 gradient boosting model 的问题Data Scientist的编程能力
现在的大数据技术的价值和功用有些被夸大了求Google 的 Data Science 有关的位置内推 (转载)
Data scientist / Machine Learning Engineer 相关面试题 (转载)求handle missing data的好方法
刚入行新人的两个问题有人考虑过kaggle上这个预测CTR的题目么?
请问哪些算法是可以用python写的,然后输入PMML我想写个survey报告 关于KNN classification algorithms
相关话题的讨论汇总
话题: click话题: ads话题: rate话题: regression话题: assume