由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - How to paralell logistic regression estimation?
相关主题
保险公司technical interview 会怎么问?Re: Questions on REML?
proc logistic: how to build 2 X 2 classification table问一个关于linear regression的error假设问题
T家面试题目求解答~~seek help on a simple regression question,Baozi thanks
请问bank里面什么时候用OLS而不用logistic regression?用ARIMA估计出来的point estimate和OLS的不一样
predict from logistic model: point estimate or confidence interval?Standard Errors Calculation
请教一个logistic regression的问题Problem with Maximum Likelihood Estimation
最近捣鼓collaborative filterUMVUE存在的充要条件
如何做ordinal logistic regression的validation?any one did EM to find MLE of mixed effects model in R
相关话题的讨论汇总
话题: theta话题: estimation话题: paralell话题: logistic话题: regression
进入Statistics版参与讨论
1 (共1页)
S******y
发帖数: 1123
1
I have finally got Hadoop working on my Linux box. Next I would like to try
to see if I could to parallel model estimation for some commonly used models
such as logistic regression.
My question now is - how to paralell gradient descent for logistic model
estimation for real large data set?
Any thoughts would be greatly appreciated. Thanks in advance!
PS. See R code below. If needed, I could rewrite the following code in Java
or Python. But the question is how to decompose the following estimation
method in a map/reduce fashion -
my.logistic<-function(par, X,y, alpha, plot=FALSE)
{
n <- ncol(X)
m <- nrow(X)
ll<- rep(NA, m)
theta_all <- matrix(NA, n, m)
X<-cbind(1,X)
#theta - glm estimates as starting values
theta_all<-theta
for (i in 1:m)
{
dim(X)
length(theta)
hx <- sigmoid(X %*% theta) # matrix product
theta <- theta + alpha * (y - hx)[i] * X[i, ]
logl <- sum( y * log(hx) + (1 - y) * log(1 - hx) )
ll[i] <- logl
theta_all = cbind(theta_all, theta)
}
if(plot) {
par(mfrow=c(4,2))
plot(na.omit(ll))
lines(ll[1:i])
for (j in 1:6)
{
plot(theta_all[j, 1:i])
lines(theta_all[j, 1:i])
}
}
return(list(par=theta, loglik=logl))
}
d******e
发帖数: 7844
2
网上一搜一大把。
这个R肯定搞不定,Python速度太慢,不了解Java的数值计算速度如何。
这种问题肯定首推是C/C++或者Fortran

try
models
Java

【在 S******y 的大作中提到】
: I have finally got Hadoop working on my Linux box. Next I would like to try
: to see if I could to parallel model estimation for some commonly used models
: such as logistic regression.
: My question now is - how to paralell gradient descent for logistic model
: estimation for real large data set?
: Any thoughts would be greatly appreciated. Thanks in advance!
: PS. See R code below. If needed, I could rewrite the following code in Java
: or Python. But the question is how to decompose the following estimation
: method in a map/reduce fashion -
: my.logistic<-function(par, X,y, alpha, plot=FALSE)

S******y
发帖数: 1123
3
Thanks for reply!
Found a good paper about this topic -
www.cs.toronto.edu/~amnih/cifar/talks/delalleau_talk.pdf
File Format: PDF/Adobe Acrobat - Quick View
by O Delalleau - Cited by 4 - Related articles
Parallel Stochastic Gradient Descent. Olivier Delalleau and Yoshua Bengio.
University of Montreal. August 11th, 2007. CIAR Summer School - Toronto ...
S******y
发帖数: 1123
4
Have anybody done that with Revo R on Hadoop?
o****o
发帖数: 8077
5
no real experience on MapReduce, but my thinking is whether or not you can
do that on a OLS? If so, then you can do that for the OLS part.

try
models
Java

【在 S******y 的大作中提到】
: I have finally got Hadoop working on my Linux box. Next I would like to try
: to see if I could to parallel model estimation for some commonly used models
: such as logistic regression.
: My question now is - how to paralell gradient descent for logistic model
: estimation for real large data set?
: Any thoughts would be greatly appreciated. Thanks in advance!
: PS. See R code below. If needed, I could rewrite the following code in Java
: or Python. But the question is how to decompose the following estimation
: method in a map/reduce fashion -
: my.logistic<-function(par, X,y, alpha, plot=FALSE)

S******y
发帖数: 1123
6
Thanks. oloolo.
The paper I found says - "Split data into c chunks (each of the c CPUs sees
one chunkof the data), and perform mini-batch stochastic gradient
descent with parameters store in shared memory"
It seems that the trick is always to split data into chunks. Just like Revo
R 's XDF file chunks.
s*********e
发帖数: 1051
7
agree with oloolo
regression-type model is not a good candidate for parallel processing.

【在 o****o 的大作中提到】
: no real experience on MapReduce, but my thinking is whether or not you can
: do that on a OLS? If so, then you can do that for the OLS part.
:
: try
: models
: Java

t****a
发帖数: 1212
8
去参考一下doMC, doMPI以及foreach包吧
S******y
发帖数: 1123
9
Thank everybody for reply!
So what would be good candidates for parallel processing?
Decision trees? KNN? Ensemble?
Happy Holiday :-)
d******e
发帖数: 7844
10
你落伍了。
我们现在做的并行算法可以在clustering上用几十几百GB的数据做regression。

【在 s*********e 的大作中提到】
: agree with oloolo
: regression-type model is not a good candidate for parallel processing.

D******n
发帖数: 2836
11
benchmarking.

【在 S******y 的大作中提到】
: Thank everybody for reply!
: So what would be good candidates for parallel processing?
: Decision trees? KNN? Ensemble?
: Happy Holiday :-)

z******n
发帖数: 397
12
什么并行算法?是pub的还是你们自己内部搞的?

【在 d******e 的大作中提到】
: 你落伍了。
: 我们现在做的并行算法可以在clustering上用几十几百GB的数据做regression。

d******e
发帖数: 7844
13
算法当然是已有的,我们自己改进的,解个regression不过是小case而已
现在搞大规模并行、分布式优化的人不要太多哦,你自己搜一搜能找到一大把。

【在 z******n 的大作中提到】
: 什么并行算法?是pub的还是你们自己内部搞的?
1 (共1页)
进入Statistics版参与讨论
相关主题
any one did EM to find MLE of mixed effects model in Rpredict from logistic model: point estimate or confidence interval?
请教有关充分完备统计量与UMVUE关系的问题请教一个logistic regression的问题
请教一下sas proc logistic里contrast statement的用法最近捣鼓collaborative filter
a question for odds ratio estimate for logistic regression.如何做ordinal logistic regression的validation?
保险公司technical interview 会怎么问?Re: Questions on REML?
proc logistic: how to build 2 X 2 classification table问一个关于linear regression的error假设问题
T家面试题目求解答~~seek help on a simple regression question,Baozi thanks
请问bank里面什么时候用OLS而不用logistic regression?用ARIMA估计出来的point estimate和OLS的不一样
相关话题的讨论汇总
话题: theta话题: estimation话题: paralell话题: logistic话题: regression