c***z 发帖数: 6348 | 1 Spent some time generating this kind of charts from raw data. There might be
better ways of doing so, but I would just post my method and 抛砖引玉。
Raw table has three columns: clinic | age | count, which records the age of
patients, rather, how many of each age category.
Target table has three columns: clinic | age_percentile | count_percentage,
which records the percentage of patients in each age category, with the
categories in percentiles form (e.g. if there are only two age categories,
then the percentiles would be 50 and 100).
Here is the R code (I knew Scala code must be simpler but my company is not
using it)
# order by clinic and age
visits <- visits[with(visits,
order(clinic, age)), ]
# percentiles of age
percentiles <- by(visits$age,
list(visits$clinic),
function(x) trunc(rank(x)/length(x) * 100),
simplify = T)
# percentages of count
percentages <- by(visits$count,
list(visits$clinic),
function(x) x / sum(x),
simplify = T)
# put them together
patient_percentiles <- cbind(row.names(percentiles),
percentiles,
percentages)
patient_percentiles <- data.frame(patient_percentiles)
# unpack list elements
patient_percentiles <- with(patient_percentiles,
cbind(melt(percentiles),
melt(percentages)))
# clean up
patient_percentiles <- patient_percentiles[, c(2,1,3)]
colnames(patient_percentiles) <- c("clinic", "age_percentiles", "count_
percentages") | f***8 发帖数: 571 | 2 能不能贴点数据?不是太清楚
Raw table has three columns: clinic | age | count, which records the age of
patients, rather, how many of each age category.
的意思。
感觉用dplyr可能会简洁一些? | c***z 发帖数: 6348 | 3 sorry, here is an example
clinic | age | count
A | 12 | 3
A | 18 | 2
B | 22 | 4
B | 40 | 2
就是说A家有3位12岁的病人,2位18岁的病人;B家有4位22岁的病人,2位40岁的病人。
谢谢回复,我去看看dplyr | c***z 发帖数: 6348 | 4 sorry 忘了一步
# add up for each percentile
patient_percentiles_fin <- aggregate(count_ percentages
~ clinic + age_percentiles,
FUN = sum,
data = patient_percentiles) | c***z 发帖数: 6348 | 5 老板又有新花样,这次要cumulative的percentages
patient_percentiles_cum <- patient_percentiles_fin[, c(1,102)]
colnames(patient_percentiles_cum)[2] <- "top.0"
for (k in 1:100) {
# k <- 1
temp <- patient_percentiles_fin[, c(102:(102-k))]
top <- apply(temp,
1,
FUN = sum)
top <- data.frame(top)
patient_percentiles_cum <- cbind(patient_percentiles_cum,
top)
colnames(patient_percentiles_cum)[2+k] <- paste("top",
k,
sep = ".")
} | H****E 发帖数: 254 | | c***z 发帖数: 6348 | | f***8 发帖数: 571 | 8 合成的数据:
library(dplyr) # version: ≥0.3
set.seed(123)
visits <- data_frame(clinic=sample(LETTERS[1:5], 20, replace=TRUE)) %>%
group_by(clinic) %>%
mutate(age=sample(1:50, length(clinic), replace=FALSE),
count=sample(1:100, length(clinic), replace=TRUE)) %>%
arrange(clinic, age)
我的做法:
patient_percentiles2 <- visits %>%
group_by(clinic) %>%
mutate(age.percentile=as.integer(min_rank(age)/length(age)*100),
count.percentage=count/sum(count)) %>%
select(clinic, age.percentile, count.percentage)
抛砖引玉,欢迎指教!
【在 c***z 的大作中提到】 : sorry, here is an example : clinic | age | count : A | 12 | 3 : A | 18 | 2 : B | 22 | 4 : B | 40 | 2 : 就是说A家有3位12岁的病人,2位18岁的病人;B家有4位22岁的病人,2位40岁的病人。 : 谢谢回复,我去看看dplyr
| c***z 发帖数: 6348 | 9 Thanks a lot! Definitely will try out.
【在 f***8 的大作中提到】 : 合成的数据: : library(dplyr) # version: ≥0.3 : set.seed(123) : visits <- data_frame(clinic=sample(LETTERS[1:5], 20, replace=TRUE)) %>% : group_by(clinic) %>% : mutate(age=sample(1:50, length(clinic), replace=FALSE), : count=sample(1:100, length(clinic), replace=TRUE)) %>% : arrange(clinic, age) : 我的做法: : patient_percentiles2 <- visits %>%
| c***z 发帖数: 6348 | 10 Yes, it works like a charm! Thanks a lot!
【在 f***8 的大作中提到】 : 合成的数据: : library(dplyr) # version: ≥0.3 : set.seed(123) : visits <- data_frame(clinic=sample(LETTERS[1:5], 20, replace=TRUE)) %>% : group_by(clinic) %>% : mutate(age=sample(1:50, length(clinic), replace=FALSE), : count=sample(1:100, length(clinic), replace=TRUE)) %>% : arrange(clinic, age) : 我的做法: : patient_percentiles2 <- visits %>%
| c***z 发帖数: 6348 | 11 And it is beautiful in style, I can feel the flow. :) | f***8 发帖数: 571 | 12 The credit goes to Hadley Wickham..
【在 c***z 的大作中提到】 : And it is beautiful in style, I can feel the flow. :)
|
|