c***z 发帖数: 6348 | 1 【 以下文字转载自 DataSciences 讨论区 】
发信人: chaoz (晨钟暮鼓), 信区: DataSciences
标 题: generating percentile-percentage charts
发信站: BBS 未名空间站 (Mon Nov 24 20:11:11 2014, 美东)
Spent some time generating this kind of charts from raw data. There might be
better ways of doing so, but I would just post my method and 抛砖引玉。
Raw table has three columns: clinic | age | count, which records the age of
patients, rather, how many of each age category.
Target table has three columns: clinic | age_percentile | count_percentage,
which records the percentage of patients in each age category, with the
categories in percentiles form (e.g. if there are only two age categories,
then the percentiles would be 50 and 100).
Here is the R code (I knew Scala code must be simpler but my company is not
using it)
# order by clinic and age
visits <- visits[with(visits,
order(clinic, age)), ]
# percentiles of age
percentiles <- by(visits$age,
list(visits$clinic),
function(x) trunc(rank(x)/length(x) * 100),
simplify = T)
# percentages of count
percentages <- by(visits$count,
list(visits$clinic),
function(x) x / sum(x),
simplify = T)
# put them together
patient_percentiles <- cbind(row.names(percentiles),
percentiles,
percentages)
patient_percentiles <- data.frame(patient_percentiles)
# unpack list elements
patient_percentiles <- with(patient_percentiles,
cbind(melt(percentiles),
melt(percentages)))
# clean up
patient_percentiles <- patient_percentiles[, c(2,1,3)]
colnames(patient_percentiles) <- c("clinic", "age_percentiles", "count_
percentages") |
|