第2页 - 关于sapply的讨论汇总 - 话题女王

f***a
发帖数: 329

来自主题: Statistics版 - 【欢迎进来讨论】for loop in R

照例，还是我先胡说几句，:-)
在R里面能不用for loop就不应该用，尽量用vectorize的方式搞定一切。
对matrix/data.frame的row or col做运算，就用apply；（btw, same for array）
要对list, data.frame(essentially it is a list), vector的element做运算就用
lapply, sapply；
对不同id做运算，用tapply
下面是我的问题。
1）
# Way I:
for(i in 1:n){
res[i] <- myfunction(a[i], b[i], c[i])
}
# Way II:
res <- apply(cbind(a,b,c), 1, function(t)
myfunction(t[1], t[2], t[3])
)
这两种方法equivalent还是way II好一些呢？
2)
# Way I:
for(i in 1:n){
input <- i
...... # some heavy calculation
res[i] <- output
}
... 阅读全帖

D******n
发帖数: 2836

来自主题: Statistics版 - 【欢迎进来讨论】for loop in R

After a little research,for apply it is true, but not so for the entire
"Apply Family"
R loop -> apply
C code -> lapply -->sapply
|
+------>tapply
C code -> mapply
I haven't tested it yet, but i guess for other members of the apply
family , they do much better than for loop.

optimized

t****a
发帖数: 1212

来自主题: Statistics版 - missing data question in R

1. you can use is.na() function to detect if a value is NA.
2. your code would be very slow when you are computing large data.frame.
because you are using apply or sapply. I would write it with vector
computing like:
data$newcol = ifelse(is.na(data[,2]) | is.na(data[,4]) | data[,2]!=data[,4],
as.numeric(data[,5]) - 1, as.numeric(data[,5]))

t****a
发帖数: 1212

来自主题: Statistics版 - R的循环语句该怎么用。

I think you can use sapply(1:100, function(i) {}) or for (i in 1:100) {...}
to implement your idea.
However, I don't understand the request of "然后把选的这100组数据作在一张图
里。". If you want to put everything in one figure, that is called "overplot
" since the figure would be totally not readable.
If you want to put them into different "panel" (for example, 10 rows x 10
cols panels), you probably want to try "lattice" package to do this. My
suggestion is: why not use some descriptive statistics (such as mea... 阅读全帖

c*****a
发帖数: 808

来自主题: Statistics版 - R的循环语句该怎么用。

做成function，然后用rep或者sapply?

c*****a
发帖数: 808

来自主题: Statistics版 - 问个R的问题

以前我上课听一个stat computation的老师，经常在吹R的vectorization, how
awesome vector operations in R. 他也是叫我们少用for loop,多用sapply, lapply
什么的.
如果2个for loop在一起用算大点点的samplesize，等半天啊

J*****n
发帖数: 4859

来自主题: Statistics版 - A question about R

I have a data frame r of size 700000 * 13.
then I ran following code
d <- sapply(r$Date, toString)
Then it came:
Error: cannot allocate vector of size 7 Kb
How can I resovle this problem?
Thank you.

a***d
发帖数: 336

来自主题: Statistics版 - R split function

use write.csv() inside sapply.

s*****n
发帖数: 2174

来自主题: Statistics版 - [R] How to apply "apply" here?

sapply(1:dim(a)[2], function(n) cor(A[, n], A[, n]))

a***d
发帖数: 336

来自主题: Statistics版 - How can I do this in R?

the data has duplicate product_id in each customer_id and seems lz wants a
count of distinct product_id for each customer_id.
if we just want the final data frame, maybe
aa <- split(t[,"product_id"],t$customer_id)
bb <- sapply(aa,function(x) length(unique(x)))
data.frame(cid=names(bb),npid=bb)

k*******a
发帖数: 772

来自主题: Statistics版 - 如何在R里refer上一个row的值

index<-sapply(2:dim(data)[1],function(x) (data$var2[x] %in% data[x-1,]))
subdata<-data[c(FALSE,index),]

a***d
发帖数: 336

来自主题: Statistics版 - 请教：如何能加速R codes 运行？

do you have a lot of 'for' loops in the simulation? replace those with '
sapply' will speed things up greatly.

k*******a
发帖数: 772

来自主题: Statistics版 - 说个面试题，大家讨论一下

Agree, this is correct
We can easily verify using simulation
m=100 n=100: simulation: 63 prediction: 63
m=100 n=50 : simulation: 39 prediction: 39
m=100 n=25 : simulation: 22 prediction: 22
R code for simulation:
unik <- function(m, n) mean(sapply(1:1000, function(x) length(unique(sample(
1:m, n, replace=T)))))
unik(100, 100)
unik(100, 50)
unik(100, 25)

k*******a
发帖数: 772

来自主题: Statistics版 - 脑筋急转弯的面试题

嗯， code 只是考虑比较简单的情形，不过大概意思差不多
sp <- 15
nbet <- 10000
aa <- function(x) {
win <- rbinom(nbet, 1, .495)
win <- ifelse(win, 1, -1)
winm <- 5 + cumsum(win)*.75
i0 <- which(winm<=0)[1]
i1 <- which(winm>=sp)[1]
i0 <- ifelse(is.na(i0), Inf, i0)
i1 <- ifelse(is.na(i1), Inf, i1)
if (i0==Inf & i1==Inf) return(NA) else return(ifelse(i0 }
b <- sapply(1:5000, aa)
mean(b)

D******n
发帖数: 2836

来自主题: Statistics版 - R 有点令人失望

做一个东西，分别用了R和SAS实现。
R比较好写code，可是SAS在速度上超出R很多
基本上SAS是
data new;
set old;
by id;
%dosth;
run;
R就是
new <- split(old,old$id) #这步没有进入速度比较
g<-sapply(new,func_dosth);
由于dosth是对matrix结构的数据进行操作，用R写自然很多，用SAS写比较别扭。
可是一比较，SAS的速度是R的10到20倍。
如果是1秒跟10秒的区别还好，问题是数据都比较大，那就是1天跟20天的区别。
R可以洗洗睡了。

q**j
发帖数: 10612

来自主题: Statistics版 - R 有点令人失望

请问你是什么操作系统？好像windows下面差很多，linux下面就不一定了。另外我自己
尝试过，windows下面R根本就不能处理大数据，但是好像挺朋友说linux下面就没有问
题。请问这个是否属实，为什么？另外请问python和R各有什么比较好的optimization
的package。多谢。

做一个东西，分别用了R和SAS实现。
R比较好写code，可是SAS在速度上超出R很多
基本上SAS是
data new;
set old;
by id;
%dosth;
run;
R就是
new <- split(old,old$id) #这步没有进入速度比较
g<-sapply(new,func_dosth);
由于dosth是对matrix结构的数据进行操作，用R写自然很多，用SAS写比较别扭。
可是一比较，SAS的速度是R的10到20倍。
如果是1秒跟10秒的区别还好，问题是数据都比较大，那就是1天跟20天的区别。
R可以洗洗睡了。

o****o
发帖数: 8077

来自主题: Statistics版 - 有技巧得用R才能发挥它的威力

借帖问如何高效读入大的CSV或者任意TXT文件
比如读入一个700多MB的CSV，在r里面很慢，即使是用如下方式先预置了每列的属性：
trainset<-read.csv("train_set.csv", nrows=1000)
colClasses<-sapply(trainset, class);
trainset<-read.csv("train_set.csv", sep=",", header=T,
colClasses=colClasses)
仍然要花很长时间，差不都是SAS的30倍，SAS一分钟，R硬是花了30多分钟。

o****o
发帖数: 8077

来自主题: Statistics版 - 有技巧得用R才能发挥它的威力

looks like ff package helps on solving the problem where the file is TOO
large to fit in memory, like the bigmemory package does, but it doesn't help
on efficiency here as it maps data into disk.
Am I missing anything here?
>
> library(ff)
>
> system.time(
+ dsnff<-read.csv.ffdf(file="c:\_data\MNISTtrain.csv")
+ )
user system elapsed
22.44 9.30 42.17
>
> system.time(
+ dsn1<-read.csv(file="c:\_data\MNISTtrain.csv")
+ )
user system elapsed
13.71 0.04 13.77
>
>
> t<-Sys.t... 阅读全帖

Y****a
发帖数: 243

来自主题: Statistics版 - 一个R的问题

din <- c(1,1,2,3,4,6,7)
dout <- sapply(1:length(din),function(i) {
sum(din[1:i] %% 2)
})
dout

i**z
发帖数: 194

来自主题: Statistics版 - 提高R速度的一些tips

参照 R cookbook 里面有不少 tips.
另外， lapply 可能会快点， sapply 和 loop 其实差不多。

前段时间受了不少R运算速度太慢的折磨。做了点research 知道了点皮毛，抛砖引玉，
大家讨论一下。
1、Vectorization
for (i in …)
{
for (j in …) { dframe <- func(dframe,i,j)
}
}
这样的结构对R来说是个disaster。可以考虑ecterization
e.g. Instead of explicit element-by-element loop for
(i in 1:N) { A[i] <- B[i] + C[i] }
invoke the implicit elem.-by-elem. Operation: A <- B + C
2、用apply instead of looping
这个似乎有争议，有的说apply不能提高R的速度。不过，至少apply可以让你的code看
上去更简洁
3、Functional programming:
exp1:Filter(f,... 阅读全帖

I*****a
发帖数: 5425

来自主题: Statistics版 - 讨论个问题，classification 的label 非常不平均

你这个不算是吧。
n = 1000 # training size
ntest = 1000 # test size; make this big only for illustration
id.train = 1:n
id.test = (n + 1):(n + ntest)
ratio = 0.99
n0 = round(n * ratio)
n1 = n - n0
nsimu = 100
res = NULL
for (i in 1:nsimu){
p = c(runif(n0, 0, 0.5), runif(n1, 0.5, 1), runif(ntest, 0.6, 1) )
y = sapply(p, function(x){rbinom(n = 1, size = 1, prob = x)})
x = log(p / (1 - p)) # beta is c(0, 1)
dat = data.frame(x = x, y = y)
f... 阅读全帖

k*******a
发帖数: 772

来自主题: Statistics版 - Help -- R code for character field processing

读第二个单词
title <- function(x) scan(textConnection(x), what=character(), n=2, quiet=T
)[2]
sapply(name, title, USE.NAMES=F)

G******n
发帖数: 289

来自主题: Statistics版 - 如何把几种数据（有共同特征栏）合并到一块？

R， merge，先把共同的merge了，剩下的sapply一下……

f***8
发帖数: 571

来自主题: DataSciences版 - R 问题请教

t(apply(mtcars, 2, summary))[, c(4,1,6)] # If all columns are numeric
t(apply(mtcars[, sapply(mtcars, is.numeric)], 2, summary))[, c(4,1,6)] # If
not sure
Output:
Mean Min. Max.
mpg 20.0900 10.400 33.900
cyl 6.1880 4.000 8.000
disp 230.7000 71.100 472.000
hp 146.7000 52.000 335.000
drat 3.5970 2.760 4.930
wt 3.2170 1.513 5.424
qsec 17.8500 14.500 22.900
vs 0.4375 0.000 1.000
am 0.4062 0.000 1.000
gear 3.6880 3.000 5.000
carb 2.8120 1... 阅读全帖

p****r
发帖数: 46

来自主题: DataSciences版 - 板上R高手多，包子求R数据输出到CSV方法

# create matrix from applist, then transpose it
# so the matrix is N rows * 10 columns
app <- t(data.frame(applist))
# Same for scorelist
score<- t(data.frame(scorelist))
# generate column sequence (1,11,2,12...10,20) so as to reorder them after
cbind
cols <- rep(1:10,each=2)+rep(c(0,10),10)
# or you can do cols <- unlist(sapply(1:10,function(x) list(x,x+10)))
data <- cbind(app,score)
# reorder columns
data <- data[,cols]
# generate col_names: "applist1", "scorelist1", "applist2","scorelist2"...... 阅读全帖

Y****a
发帖数: 243

来自主题: DataSciences版 - 请教问题 R list of list to vector to data frame

我是说你的 @data@data@item 那部分
除了lapply，还有sapply可以用

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天