f***a 发帖数: 329 | 1 照例,还是我先胡说几句,:-)
在R里面能不用for loop就不应该用,尽量用vectorize的方式搞定一切。
对matrix/data.frame的row or col做运算,就用apply;(btw, same for array)
要对list, data.frame(essentially it is a list), vector的element做运算就用
lapply, sapply;
对不同id做运算,用tapply
下面是我的问题。
1)
# Way I:
for(i in 1:n){
res[i] <- myfunction(a[i], b[i], c[i])
}
# Way II:
res <- apply(cbind(a,b,c), 1, function(t)
myfunction(t[1], t[2], t[3])
)
这两种方法equivalent还是way II好一些呢?
2)
# Way I:
for(i in 1:n){
input <- i
...... # some heavy calculation
res[i] <- output
}
# Way II:
res <- lapply(1:n, function(t){
input <- t
...... # some heavy calculation
output
}
)
这两种方法equivalent还是way II好一些呢?
3)
# Way I:
for(i in 1:n){
input <- res[i-1]
... # some calculation
res[i] <- output
}
有办法不用for loop解决吗?
4)
# Way I:
for(i in 1:n){
res[[i]] <- read.table( paste("file_",i,".txt", sep="") )
}
# Way II:
res <- lapply(1:n, function(t)
read.table( paste("file_",t,".txt", sep="") )
)
不是做数学运算,还是干些其他一些事情呢(譬如IO data)?效果一样?
大家发表下自己的看法吧,或者有什么用apply vs for的经验也说说。 |
w*****t 发帖数: 49 | 2 apply is much faster than for. |
f***a 发帖数: 329 | 3 In general vectorized computation case, no double apply is faster than
regular for loop (as I stated in the beginning). But in case of my
questions, is the efficiency of apply still that significant? (BTW, what's
the internal procedure/algorithm that makes apply more efficient over for
loop?)
And to extend the discussion, "for loop" can be replaced by {foreach}
looping in sense of parallel computing. In this case, how efficient is it
comparing with parallel-type "apply" functions in {snow}, {multicore}
packages?
Hope someone can share experience...
【在 w*****t 的大作中提到】 : apply is much faster than for.
|
P****D 发帖数: 11146 | |
M*P 发帖数: 6456 | 5 都已经用R了,还在乎这个?
【在 f***a 的大作中提到】 : In general vectorized computation case, no double apply is faster than : regular for loop (as I stated in the beginning). But in case of my : questions, is the efficiency of apply still that significant? (BTW, what's : the internal procedure/algorithm that makes apply more efficient over for : loop?) : And to extend the discussion, "for loop" can be replaced by {foreach} : looping in sense of parallel computing. In this case, how efficient is it : comparing with parallel-type "apply" functions in {snow}, {multicore} : packages? : Hope someone can share experience...
|
d******e 发帖数: 7844 | 6 呵呵,同样是用R,速度能相差一百倍,你信么?
【在 M*P 的大作中提到】 : 都已经用R了,还在乎这个?
|
f***a 发帖数: 329 | 7 It seems the actual looping of "lapply" is done internally in C code and "
apply" isn't really faster than writing a loop. The main advantage of "apply
" is it simplifies code writing?
colMeans/rowSums() and vectorization of a function are faster than a loop
though.
Anyway, I think, for algorithm with heavy computation involved, C/C++ should
be employed to handle computing part. And I strongly recommend {Rcpp} which
provides much much better API than the original one in R.
(My previous questions remain unanswered.... T_T) |
M*V 发帖数: 11 | 8 Sometimes while can be used for loops, which is faster than if. For
computationally intense task, maybe it's good to link with C. Just my 2
cents. |
r*g 发帖数: 3159 | 9 R 里面 Apply 就是for loop. 说apply比for快那是迷信。引自r 作者
apply() is just a wrapper for a for loop. So it is not faster that at
least one implementation using a for loop: it may be neater and easier to
understand than an explicit for loop. |
P****D 发帖数: 11146 | 10 !!!!!!!
【在 r*g 的大作中提到】 : R 里面 Apply 就是for loop. 说apply比for快那是迷信。引自r 作者 : apply() is just a wrapper for a for loop. So it is not faster that at : least one implementation using a for loop: it may be neater and easier to : understand than an explicit for loop.
|
|
|
D******n 发帖数: 2836 | 11 這好比說,所有計算機語言比機器碼快都是迷信,因為這些都是機器碼的wrapper而已。
to
【在 r*g 的大作中提到】 : R 里面 Apply 就是for loop. 说apply比for快那是迷信。引自r 作者 : apply() is just a wrapper for a for loop. So it is not faster that at : least one implementation using a for loop: it may be neater and easier to : understand than an explicit for loop.
|
g********r 发帖数: 8017 | 12 试了一下,还真是
> a<-matrix(rnorm(10000000),ncol=100)
> dim(a)
[1] 100000 100
> r<-1:100000
> system.time(for(i in 1:100000) r[i]<-mean(a[i,]))
user system elapsed
2.239 0.053 2.276
> system.time(r2<-apply(a,1,mean))
user system elapsed
2.731 0.048 2.763
> system.time(r3<-rowMeans(a))
user system elapsed
0.033 0.001 0.034
已。
【在 D******n 的大作中提到】 : 這好比說,所有計算機語言比機器碼快都是迷信,因為這些都是機器碼的wrapper而已。 : : to
|
P****D 发帖数: 11146 | 13 这没错吧。你不是讽刺吧……
已。
【在 D******n 的大作中提到】 : 這好比說,所有計算機語言比機器碼快都是迷信,因為這些都是機器碼的wrapper而已。 : : to
|
D******n 发帖数: 2836 | 14 如果他說的for loop是指R的for loop,那我對R 的apply family很驚訝失望。
他說的apply 就如for loop 就順理成章了。
我理解的是general for loop,就是說,譬如Apply是用C寫的,那怎麼都會有for loop
在code裏
面的。
【在 P****D 的大作中提到】 : 这没错吧。你不是讽刺吧…… : : 已。
|
F****n 发帖数: 3271 | 15 From what I learned, in R apply <= loop in terms of performance. The
apply family are a bunch of convenient tools *BUILT ON loop*. You
don't need to be 驚訝失望, because actually "loops" in R is not that
slow. Most perceived performance issues in R loops are not related to
loops themselves, but more or less due to the R data objects, which are
immutable and must be indexed.
loop
【在 D******n 的大作中提到】 : 如果他說的for loop是指R的for loop,那我對R 的apply family很驚訝失望。 : 他說的apply 就如for loop 就順理成章了。 : 我理解的是general for loop,就是說,譬如Apply是用C寫的,那怎麼都會有for loop : 在code裏 : 面的。
|
F****n 发帖数: 3271 | 16 Again, it is a common misunderstanding that in R apply is faster than
loops. They are the same.
Enhancing performance using vectorization means using built-in optimized
functions such as rowMeans, not using apply
【在 f***a 的大作中提到】 : 照例,还是我先胡说几句,:-) : 在R里面能不用for loop就不应该用,尽量用vectorize的方式搞定一切。 : 对matrix/data.frame的row or col做运算,就用apply;(btw, same for array) : 要对list, data.frame(essentially it is a list), vector的element做运算就用 : lapply, sapply; : 对不同id做运算,用tapply : 下面是我的问题。 : 1) : # Way I: : for(i in 1:n){
|
D******n 发帖数: 2836 | 17 After a little research,for apply it is true, but not so for the entire
"Apply Family"
R loop -> apply
C code -> lapply -->sapply
|
+------>tapply
C code -> mapply
I haven't tested it yet, but i guess for other members of the apply
family , they do much better than for loop.
optimized
【在 F****n 的大作中提到】 : Again, it is a common misunderstanding that in R apply is faster than : loops. They are the same. : Enhancing performance using vectorization means using built-in optimized : functions such as rowMeans, not using apply
|
F****n 发帖数: 3271 | 18 Yeah, you are right, but I think since
C code -> R loop too,
lapply == loop >= apply
In other words lapply is faster than apply but not necessarily better than
loop.
【在 D******n 的大作中提到】 : After a little research,for apply it is true, but not so for the entire : "Apply Family" : R loop -> apply : C code -> lapply -->sapply : | : +------>tapply : C code -> mapply : I haven't tested it yet, but i guess for other members of the apply : family , they do much better than for loop. :
|
D******n 发帖数: 2836 | 19 你才是不知所云。先不说你理解是错的。明明讨论问题,说这些话干什么?
明明讲R,哪来政治正确?
roughly
【在 r*g 的大作中提到】 : R 里面 Apply 就是for loop. 说apply比for快那是迷信。引自r 作者 : apply() is just a wrapper for a for loop. So it is not faster that at : least one implementation using a for loop: it may be neater and easier to : understand than an explicit for loop.
|
P****D 发帖数: 11146 | 20 哦,原来你惊讶失望是嫌R中的apply family没优化。
loop
【在 D******n 的大作中提到】 : 如果他說的for loop是指R的for loop,那我對R 的apply family很驚訝失望。 : 他說的apply 就如for loop 就順理成章了。 : 我理解的是general for loop,就是說,譬如Apply是用C寫的,那怎麼都會有for loop : 在code裏 : 面的。
|
D******n 发帖数: 2836 | 21 Ya, because people keep saying the apply family is faster, so i was
surprised to find out (see floor 17), apply alone is not C code(or other non
R script) based while other family members are.
just type apply in R , you can find out it is totally R code with a loop.
.....
else for (i in 1L:d2) {
tmp <- FUN(array(newX[, i], d.call, dn.call), ...)
if (!is.null(tmp))
ans[[i]] <- tmp
}
.....
for lapply it is like this
function (X, FUN, ...)
{
FUN <- match.fun(FUN)
if (!is.vector(X) || is.object(X))
X <- as.list(X)
.Internal(lapply(X, FUN))
}
【在 P****D 的大作中提到】 : 哦,原来你惊讶失望是嫌R中的apply family没优化。 : : loop
|