a********a 发帖数: 346 | 1 I have a data set like the following,
id cluster beta
[1,] 1 1 1.6
[2,] 1 2 1.64
[3,] 1 2 0.98
[4,] 1 3 1.57
[5,] 1 3 1.66
[6,] 1 3 0.33
[7,] 1 4 0.53
[8,] 1 4 -0.61
[9,] 1 4 0.90
[10,] 1 4 0.45
[11,] 1 5 1.16
[12,] 1 5 0.85
[13,] 1 5 0.68
[14,] 1 5 0.69
[15,] 1 5 1.24
Now for each cluster, if beta is less than 1.0, then delete the whole row,
and cluster need minus 1. If two rows are less than 0.5, then delete these
two rows, and cluster need minus two....(note: the # of row in | s*****n 发帖数: 2174 | 2 data$cluster <- data$cluster -
rep(tapply(data$beta < 1, data$cluster, sum),
tapply(data$cluster, data$cluster, length))
data <- data[data$beta >=1, ]
【在 a********a 的大作中提到】 : I have a data set like the following, : id cluster beta : [1,] 1 1 1.6 : [2,] 1 2 1.64 : [3,] 1 2 0.98 : [4,] 1 3 1.57 : [5,] 1 3 1.66 : [6,] 1 3 0.33 : [7,] 1 4 0.53 : [8,] 1 4 -0.61
| a********a 发帖数: 346 | 3 Thanks songkun, it is really nice. | a********a 发帖数: 346 | 4 I do not understand the following tapply functions very well, and it
confused from R help. Can you explain it a little bit?
tapply(data$beta < 1, data$cluster, sum)
tapply(data$cluster, data$cluster, length)
Thanks | s*****n 发帖数: 2174 | 5 tapply (as well as other "apply"s, such as sapply, mapply, lapply, apply)
is a relatively efficient way to do looping in R.
You have to use it for sometime before you really understand and become
comfortable of it. It is not easy to explain briefly.
If you feel confused about how tapply works, you can write a loop. By using
explicit loop, the programming logic is clearer, but the program is usually
less efficient.
【在 a********a 的大作中提到】 : I do not understand the following tapply functions very well, and it : confused from R help. Can you explain it a little bit? : tapply(data$beta < 1, data$cluster, sum) : tapply(data$cluster, data$cluster, length) : Thanks
| a********a 发帖数: 346 | 6 I got it now. Awesome idea. |
|