S******y 发帖数: 1123 | 1 I have a sorted data in R. I am trying to cut tiles/deciles based on
cumulative weights.
My code is below -- but how can I get rid of this loop (which might not be a
good coding practice here) ?
Thanks.
#------------------------------------------
#dat - R data frame (already sorted by variable of interest)
#n - number of obs
#tile - a vector of length n showing which tile the obs belongs to
#n_tile - number of tiles (i.e. deciles if it is 10)
#--------------------------------------
#R code -- |
o****o 发帖数: 8077 | 2 use upper and lower triaggle matrix, and sum over the rows to obtain the
cumulative weights, from beginning or from tail
then use the quantcut function
a
【在 S******y 的大作中提到】 : I have a sorted data in R. I am trying to cut tiles/deciles based on : cumulative weights. : My code is below -- but how can I get rid of this loop (which might not be a : good coding practice here) ? : Thanks. : #------------------------------------------ : #dat - R data frame (already sorted by variable of interest) : #n - number of obs : #tile - a vector of length n showing which tile the obs belongs to : #n_tile - number of tiles (i.e. deciles if it is 10)
|
s*****n 发帖数: 2174 | 3 你能给个简单的data frame和想要的目标data frame吗?
哪怕有几行也行, 否则这样看code太抽象了, 不知道你具体要干什么.
你现在这个code, 至少内层循环完全可以用整除来找到j. 不需要循环来找.
我估计外层循环, 最好还是保留, 硬要去掉循环, 程序可能就很难懂了.
a
【在 S******y 的大作中提到】 : I have a sorted data in R. I am trying to cut tiles/deciles based on : cumulative weights. : My code is below -- but how can I get rid of this loop (which might not be a : good coding practice here) ? : Thanks. : #------------------------------------------ : #dat - R data frame (already sorted by variable of interest) : #n - number of obs : #tile - a vector of length n showing which tile the obs belongs to : #n_tile - number of tiles (i.e. deciles if it is 10)
|
S******y 发帖数: 1123 | 4 Thanks. oloolo and songkun!
Here is the input data 'dat' sorted by score (if displayed in csv format) --
score,weight |
s*****n 发帖数: 2174 | 5 Ok, I see.
dat$cumsum <- cumsum(dat$weight)
tile <- ceiling((dat$cumsum / dat$cumsum[n]) * n_tile)
【在 S******y 的大作中提到】 : Thanks. oloolo and songkun! : Here is the input data 'dat' sorted by score (if displayed in csv format) -- : score,weight
|
D******n 发帖数: 2836 | 6 你到底想干什么呢?看起来就像把cumulative weights 做histogram,可是你又不是计
数,而是覆盖。。。。
【在 S******y 的大作中提到】 : Thanks. oloolo and songkun! : Here is the input data 'dat' sorted by score (if displayed in csv format) -- : score,weight
|
s*****n 发帖数: 2174 | 7 他要做的事情, 好像就是把象棋子分块打仗, 小卒子能力为1, 马炮能力为3, 车能力为
9等等, 然后根据能力排列起来以后平均分成几份, 每份的总能力相同.
最后9个兵分成一组tile=1, 然后三个马分成一组tile=2, ..., 车自己为一组tile=k,
等等.
【在 D******n 的大作中提到】 : 你到底想干什么呢?看起来就像把cumulative weights 做histogram,可是你又不是计 : 数,而是覆盖。。。。
|
S******y 发帖数: 1123 | 8 Thanks for your elegant R code. Songkun!
To : DaShagen,
What I am doing is to cut data into deciles based on sorted variable of
interest (say score); and once I get deciles, I can compute means of
dependent variable across deciles - to see the lift. |