How to paralell logistic regression estimation? - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - How to paralell logistic regression estimation?

相关主题
● 保险公司technical interview 会怎么问？	● Re: Questions on REML?
● proc logistic: how to build 2 X 2 classification table	● 问一个关于linear regression的error假设问题
● T家面试题目求解答～～	● seek help on a simple regression question,Baozi thanks
● 请问bank里面什么时候用OLS而不用logistic regression？	● 用ARIMA估计出来的point estimate和OLS的不一样
● predict from logistic model: point estimate or confidence interval?	● Standard Errors Calculation
● 请教一个logistic regression的问题	● Problem with Maximum Likelihood Estimation
● 最近捣鼓collaborative filter	● UMVUE存在的充要条件
● 如何做ordinal logistic regression的validation？	● any one did EM to find MLE of mixed effects model in R

相关话题的讨论汇总
话题: theta话题: estimation话题: paralell话题: logistic话题: regression

进入Statistics版参与讨论

1

(共1页)

S******y 发帖数: 1123	1 I have finally got Hadoop working on my Linux box. Next I would like to try to see if I could to parallel model estimation for some commonly used models such as logistic regression. My question now is - how to paralell gradient descent for logistic model estimation for real large data set? Any thoughts would be greatly appreciated. Thanks in advance! PS. See R code below. If needed, I could rewrite the following code in Java or Python. But the question is how to decompose the following estimation method in a map/reduce fashion - my.logistic<-function(par, X,y, alpha, plot=FALSE) { n <- ncol(X) m <- nrow(X) ll<- rep(NA, m) theta_all <- matrix(NA, n, m) X<-cbind(1,X) #theta - glm estimates as starting values theta_all<-theta for (i in 1:m) { dim(X) length(theta) hx <- sigmoid(X %% theta) # matrix product theta <- theta + alpha (y - hx)[i] * X[i, ] logl <- sum( y * log(hx) + (1 - y) * log(1 - hx) ) ll[i] <- logl theta_all = cbind(theta_all, theta) } if(plot) { par(mfrow=c(4,2)) plot(na.omit(ll)) lines(ll[1:i]) for (j in 1:6) { plot(theta_all[j, 1:i]) lines(theta_all[j, 1:i]) } } return(list(par=theta, loglik=logl)) }
d******e 发帖数: 7844	2 网上一搜一大把。这个R肯定搞不定，Python速度太慢，不了解Java的数值计算速度如何。这种问题肯定首推是C/C++或者Fortran try models Java 【在 S******y 的大作中提到】 : I have finally got Hadoop working on my Linux box. Next I would like to try : to see if I could to parallel model estimation for some commonly used models : such as logistic regression. : My question now is - how to paralell gradient descent for logistic model : estimation for real large data set? : Any thoughts would be greatly appreciated. Thanks in advance! : PS. See R code below. If needed, I could rewrite the following code in Java : or Python. But the question is how to decompose the following estimation : method in a map/reduce fashion - : my.logistic<-function(par, X,y, alpha, plot=FALSE)
S******y 发帖数: 1123	3 Thanks for reply! Found a good paper about this topic - www.cs.toronto.edu/~amnih/cifar/talks/delalleau_talk.pdf File Format: PDF/Adobe Acrobat - Quick View by O Delalleau - Cited by 4 - Related articles Parallel Stochastic Gradient Descent. Olivier Delalleau and Yoshua Bengio. University of Montreal. August 11th, 2007. CIAR Summer School - Toronto ...
S******y 发帖数: 1123	4 Have anybody done that with Revo R on Hadoop?
o****o 发帖数: 8077	5 no real experience on MapReduce, but my thinking is whether or not you can do that on a OLS? If so, then you can do that for the OLS part. try models Java 【在 S******y 的大作中提到】 : I have finally got Hadoop working on my Linux box. Next I would like to try : to see if I could to parallel model estimation for some commonly used models : such as logistic regression. : My question now is - how to paralell gradient descent for logistic model : estimation for real large data set? : Any thoughts would be greatly appreciated. Thanks in advance! : PS. See R code below. If needed, I could rewrite the following code in Java : or Python. But the question is how to decompose the following estimation : method in a map/reduce fashion - : my.logistic<-function(par, X,y, alpha, plot=FALSE)
S******y 发帖数: 1123	6 Thanks. oloolo. The paper I found says - "Split data into c chunks (each of the c CPUs sees one chunkof the data), and perform mini-batch stochastic gradient descent with parameters store in shared memory" It seems that the trick is always to split data into chunks. Just like Revo R 's XDF file chunks.
s*********e 发帖数: 1051	7 agree with oloolo regression-type model is not a good candidate for parallel processing. 【在 o****o 的大作中提到】 : no real experience on MapReduce, but my thinking is whether or not you can : do that on a OLS? If so, then you can do that for the OLS part. : : try : models : Java
t****a 发帖数: 1212	8 去参考一下doMC, doMPI以及foreach包吧
S******y 发帖数: 1123	9 Thank everybody for reply! So what would be good candidates for parallel processing? Decision trees? KNN? Ensemble? Happy Holiday :-)
d******e 发帖数: 7844	10 你落伍了。我们现在做的并行算法可以在clustering上用几十几百GB的数据做regression。【在 s*********e 的大作中提到】 : agree with oloolo : regression-type model is not a good candidate for parallel processing.
D******n 发帖数: 2836	11 benchmarking. 【在 S******y 的大作中提到】 : Thank everybody for reply! : So what would be good candidates for parallel processing? : Decision trees? KNN? Ensemble? : Happy Holiday :-)
z******n 发帖数: 397	12 什么并行算法？是pub的还是你们自己内部搞的？【在 d******e 的大作中提到】 : 你落伍了。 : 我们现在做的并行算法可以在clustering上用几十几百GB的数据做regression。
d******e 发帖数: 7844	13 算法当然是已有的，我们自己改进的，解个regression不过是小case而已现在搞大规模并行、分布式优化的人不要太多哦，你自己搜一搜能找到一大把。【在 z******n 的大作中提到】 : 什么并行算法？是pub的还是你们自己内部搞的？

1

(共1页)

进入Statistics版参与讨论

相关主题
● any one did EM to find MLE of mixed effects model in R	● predict from logistic model: point estimate or confidence interval?
● 请教有关充分完备统计量与UMVUE关系的问题	● 请教一个logistic regression的问题
● 请教一下sas proc logistic里contrast statement的用法	● 最近捣鼓collaborative filter
● a question for odds ratio estimate for logistic regression.	● 如何做ordinal logistic regression的validation？
● 保险公司technical interview 会怎么问？	● Re: Questions on REML?
● proc logistic: how to build 2 X 2 classification table	● 问一个关于linear regression的error假设问题
● T家面试题目求解答～～	● seek help on a simple regression question,Baozi thanks
● 请问bank里面什么时候用OLS而不用logistic regression？	● 用ARIMA估计出来的point estimate和OLS的不一样

相关话题的讨论汇总
话题: theta话题: estimation话题: paralell话题: logistic话题: regression

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)