D*********2 发帖数: 535 | 1 数据很简单,就三列,但数据量比较大,100 million rows,所以如果有SAS solution
最后,R跑loop实在慢.
ID Start_Date End_Date
1 01/06/2014 01/12/2014
1 01/10/2014 01/12/2014
1 01/03/2014 01/11/2014
2 01/01/2014 01/02/2014
2 05/01/2014 05/22/2014
要算number of unique coverage dates,比如ID 1就是从01/03/2014到01/12/2014一
共10天;类似的ID 2输出结果是24.
先谢谢了,有效解发包子!
happy Jul 4th day :) |
l****u 发帖数: 529 | 2 proc sql;
select distinct id, max(end_date)-min(start_date)+1 as coverdays
from your data
group by id;
quit; |
s*y 发帖数: 18644 | 3 你这个不对啊,没把gap扣掉
【在 l****u 的大作中提到】![](/moin_static193/solenoid/img/up.png) : proc sql; : select distinct id, max(end_date)-min(start_date)+1 as coverdays : from your data : group by id; : quit;
|
s*y 发帖数: 18644 | 4 说个思路,sort一下,做个loop,把有overlap的claims合并成一个,然后就简单了 |
l****u 发帖数: 529 | 5 Oups, 抱歉,没仔细看data.
proc sort data=your data; by id start_date;run;
data your data;
set your data;
retain d1 d2 d3;
by id;
if first.id then do;
d1=start_date; d2=end_date;d3=0;
end;
else do;
if start_date>d2 then d3+start_date-d2-1;
if end_date>d2 then d2=end_date;
end;
if last.id then coverdays=d2-d1+1-d3;
run; |