填充缺失值问题请教 (SAS, R, 所用软件不限) - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 填充缺失值问题请教 (SAS, R, 所用软件不限)

相关主题
● 急！一个简单的SAS问题，请大家帮帮解释一下！多谢!	● ASK FOR ONE SAS QUESTION
● 菜鸟的SAS问题，向高手求助	● 求教 SAS数据转化
● 工作中SAS问题 —另一个问题请教！	● 如何将SAS DATA中的变量名改名（不知道原变量名的前提下）
● 请教一个 SAS macro	● Excel: alternating backgroound color
● 请教：三道SAS BASE题	● [SAS] multi-thread programming and parameters...
● 请教sas高人（数据读入）	● help with a sas code
● 请教flag问题	● questions about a SAS code
● 新手问个简单的sas问题	● 请教SAS编程

相关话题的讨论汇总
话题: na话题: dosage话题: drug话题: linesplit话题: 数据

进入Statistics版参与讨论

(共1页)

y*****c
发帖数: 414

原数据：
ID DRUG_DOSAGE
1 0.4
1 NA
1 NA
1 0
1 NA
2 0.6
2 NA
2 0
3 NA
3 0.2
3 NA
3 NA
处理后：
ID DRUG_DOSAGE
1 0.4
1 0.4
1 0.4

A*******s
发帖数: 3942

你的数据如果默认是按id排序的话用这个在sas里就行
data test1(rename=(new_dosage=drug_dosage));
retain new_dosage;
set test;
by id;
if first.id=1 then do;
if drug_dosage="NA" then new_dosage=0;
else new_dosage=input(drug_dosage, 3.1);
end;
else do;
if drug_dosage ne "NA" then new_dosage=input(drug_dosage, 3.1);
end;
drop drug_dosage;
run;

【在 y*****c 的大作中提到】

: 原数据：
: ID DRUG_DOSAGE
: 1 0.4
: 1 NA
: 1 NA
: 1 0
: 1 NA
: 2 0.6
: 2 NA
: 2 0

y*****c
发帖数: 414

谢谢那么迅速的回复！！汗颜我之前不知道有retain这个statement。感谢！ btw，头
像好可爱！

l*********s
发帖数: 5409

data fill(rename=(new_dose=drug_dosage));
retain new_dose;
set test;
if drug_dosage ne . then new_dose=drug_dosage;
run;

l***a
发帖数: 12410

missed a "drop" statement?
btw, op didn't state clearly what to do if the first obs of a group is NA.
from his example, it should be either set to 0 or same to the previous obs..
. anyways, either case, it's hard to believe just 810k obs data will cost 4
days to process

【在 l*********s 的大作中提到】

: data fill(rename=(new_dose=drug_dosage));
: retain new_dose;
: set test;
: if drug_dosage ne . then new_dose=drug_dosage;
: run;

a*z
发帖数: 294

and it is easy to do in Excel also.

D******n
发帖数: 2836

awk '{if ($2=="NA") {$2=i};print;i=$2}' oldfile.dat>newfile.dat

【在 y*****c 的大作中提到】

: 原数据：
: ID DRUG_DOSAGE
: 1 0.4
: 1 NA
: 1 NA
: 1 0
: 1 NA
: 2 0.6
: 2 NA
: 2 0

f*****a
发帖数: 496

这个用sas做很简单，再大的数据量都能很准很快的算出来。
方法: Last nonmissing value forward
/*读数据，处理原数据*/
data a;
input ID DRUG_DOSAGE $ @@;
cards;
1 0.4
1 NA
1 NA
1 0
1 NA
2 0.6
2 NA
2 0
3 NA
3 0.2
3 NA
3 NA

【在 y*****c 的大作中提到】

: 原数据：
: ID DRUG_DOSAGE
: 1 0.4
: 1 NA
: 1 NA
: 1 0
: 1 NA
: 2 0.6
: 2 NA
: 2 0

D*******a
发帖数: 207

Use python; it is fast and easy to understand.
Save the following program as "abc.py", than run it as:
python abc.py < oldfinename > outputfilename
import sys
header = sys.stdin.readline()
print header,
pre_id = ""
pre_value = ""
for line in sys.stdin:
lineSplit = line.rstrip("\s").split()
if lineSplit[1] == "NA":
if lineSplit[0] == pre_id:
lineSplit[1] = pre_value;
else:
lineSplit[1] = "0"
pre_id = lineSplit[0]
pre_value = lineSplit[1]

y*****c
发帖数: 414

如果第一个值是NA，就设置为0.
R的code我已经贴出来了，十几小时至少了。因为是每天下班开始run，然后早上
去验收结果的。在一台只有1G内存的电脑上。imac上也试了，速度快一点点，也不至
于像昨天试用了sas后，只有11秒就完成了。

..
4

【在 l***a 的大作中提到】

: missed a "drop" statement?
: btw, op didn't state clearly what to do if the first obs of a group is NA.
: from his example, it should be either set to 0 or same to the previous obs..
: . anyways, either case, it's hard to believe just 810k obs data will cost 4
: days to process

相关主题
● 请教sas高人（数据读入）	● ASK FOR ONE SAS QUESTION
● 请教flag问题	● 求教 SAS数据转化
● 新手问个简单的sas问题	● 如何将SAS DATA中的变量名改名（不知道原变量名的前提下）
进入Statistics版参与讨论

b********y
发帖数: 63

You did not use R properly for this case.
use the "scan" function.

l*********s
发帖数: 5409

R的效率怎么会这么低？不可思议。

【在 y*****c 的大作中提到】

: 如果第一个值是NA，就设置为0.
: R的code我已经贴出来了，十几小时至少了。因为是每天下班开始run，然后早上
: 去验收结果的。在一台只有1G内存的电脑上。imac上也试了，速度快一点点，也不至
: 于像昨天试用了sas后，只有11秒就完成了。
:
: ..
: 4

D******n
发帖数: 2836

i tested it, even with your code in R the estimated time is just 11 hours...
lol

【在 y*****c 的大作中提到】

: 原数据：
: ID DRUG_DOSAGE
: 1 0.4
: 1 NA
: 1 NA
: 1 0
: 1 NA
: 2 0.6
: 2 NA
: 2 0

D******n
发帖数: 2836

I tested this shell script too,
awk '{if ($2=="NA") { if ($1==lid) {$2=ld} else {$2=0};};lid=$1;ld=$2;print}
' missing.tab >new.tab
it takes only 1.256 seconds

..

【在 D******n 的大作中提到】

: i tested it, even with your code in R the estimated time is just 11 hours...
: lol

y*****c
发帖数: 414

谢谢你的详细说明！！

【在 f*****a 的大作中提到】

: 这个用sas做很简单，再大的数据量都能很准很快的算出来。
: 方法: Last nonmissing value forward
: /*读数据，处理原数据*/
: data a;
: input ID DRUG_DOSAGE $ @@;
: cards;
: 1 0.4
: 1 NA
: 1 NA
: 1 0

y*****c
发帖数: 414

有5种drug需要这样的处理，还是11hours是么？

..

【在 D******n 的大作中提到】

: i tested it, even with your code in R the estimated time is just 11 hours...
: lol

b********y
发帖数: 63

The following R code should be much faster. I am also curious to know
how long it takes to run through your data?
file.in = file("data_in.txt");
file.out = file("data_out.txt")
open(file.in, open = "r")
open(file.out, open = "wt")
# title
xtitle = scan(file.in, what = list(s1 = "", s2 = ""), nline = 1, quiet =
T)
cat(file = file.out, c(xtitle$s1, xtitle$s2), sep = ", ", append =
TRUE);
cat(file = file.out, "\n")
# first obs
xfmt = list(ID = 0, DD = 0) # readin format
x0 = scan(file.in, what = xf

S******y
发帖数: 1123

#another version of Python, just for fun-

y*****c
发帖数: 414

谢谢！！

【在 S******y 的大作中提到】

: #another version of Python, just for fun-

(共1页)

进入Statistics版参与讨论

相关主题
● 请教SAS编程	● 请教：三道SAS BASE题
● sas 求助，急	● 请教sas高人（数据读入）
● 求助：data manipulation的一个问题	● 请教flag问题
● SAS help	● 新手问个简单的sas问题
● 急！一个简单的SAS问题，请大家帮帮解释一下！多谢!	● ASK FOR ONE SAS QUESTION
● 菜鸟的SAS问题，向高手求助	● 求教 SAS数据转化
● 工作中SAS问题 —另一个问题请教！	● 如何将SAS DATA中的变量名改名（不知道原变量名的前提下）
● 请教一个 SAS macro	● Excel: alternating backgroound color

相关话题的讨论汇总
话题: na话题: dosage话题: drug话题: linesplit话题: 数据

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天