由买买提看人间百态

topics

全部话题 - 话题: nodupkey
1 (共1页)
y******0
发帖数: 401
1
来自主题: Statistics版 - nodupkey 删除的observations怎么查看?
proc sort data=test nodupkey dupout=test2; by XXXX; run;
This 'test2' is the deleted data.
T*******I
发帖数: 5138
2
来自主题: Statistics版 - How to Macro it in SAS?
I have a job to do. I can do in a regular way, but I want to know how to use
SAS Macro to achieve it. It should be simple. Can someone here help me?
Thanks so much! Have Baozi!!
proc sort data=Origianl nodupkey
out=Origianl_nodup
dupout=Dup1;
by ID;
RUN;
proc sort data=Dup1 nodupkey out=Dup1 dupout=Dup2; by ID; RUN;
proc sort data=Dup2 nodupkey out=Dup2 dupout=Dup3; by ID; RUN;
proc sort data=Dup3 nodupkey out=Dup3 dupout=Dup4; by ID; RUN;
proc sort data=Dup4 nodupk... 阅读全帖
h********o
发帖数: 103
3
来自主题: Statistics版 - SAS code 求助
Try this:
=================================
DATA TEST;
INPUT CHILD $ REPRE $ @@;
CARDS;
A AAA A CCC B BBB
B CCC B DDD C AAA
C DDD C EEE C FFF
;
PROC SORT DATA = TEST OUT = CHILD (KEEP = CHILD) NODUPKEY;
BY CHILD;
RUN;
PROC SORT DATA = TEST OUT = REPRE (KEEP = REPRE) NODUPKEY;
BY REPRE;
RUN;
DATA CHILD (DROP = INDEX);
SET CHILD;
INDEX + 1;
CHILD_INDEX = PUT(INDEX, Z3.);
RUN;
DATA REPRE (DROP = INDEX);
SET REPRE;
INDEX + 1;
REPRE_INDEX = PUT(INDEX, Z5.);
RUN;
PROC S... 阅读全帖
j*****g
发帖数: 222
4
二流学校统计master毕业,3年做了四份工作,电面50+,onsite 10家, onsite 成功率
算7.5吧(0.5是有一家,老板很喜欢我,但是怕我干不长就跑了,面试之后还打电话给
我,要我表决心,那个时候我已经拿到一个更喜欢的offer了,于是不了了之), 大部
分都是marketing research方面的职位,因为比较感兴趣
总结一下经常问到的技术问题吧, 这些问题我觉得marketing research方向的话,基
本是必问的,其实都很简单,但是想到了还是列一下,ms列的有点乱
SAS
(1) Proc transpose
(2) Merge data的时候要注意什么问题?
a. Have to sort both tables before merging
b. Check what’s the type of merge (one to many, one to one, or many to
many?) --- check duplicates in each table (这条很多经验不够的通常都想不到
,只想到sort)
c. ... 阅读全帖
i***m
发帖数: 148
5
很好的经验,然后我加一些自己的经验供大家讨论

SAS
(1) Proc transpose
(2) Merge data的时候要注意什么问题?
a. Have to sort both tables before merging
b. Check what’s the type of merge (one to many, one to one, or many to
many?) --- check duplicates in each table (这条很多经验不够的通常都想不到
,只想到sort)
c. What if you only want to keep the IDs in table a?
--many to many merge, data step 与sql的不同
--missing data在merge时的处理,尤其是primary key有missing时
(3) Array
If you have a data set a with 1000 columns, you want to change all the
mi... 阅读全帖
b*********n
发帖数: 2284
6
来自主题: Statistics版 - 问个问题sas proc sort
那个nodupkey option,怎么把前面的duplication删掉?
比如ID=001有三个record,nodupkey会删掉后面两个。如果我想删掉前两个怎么弄?谢
谢。
d********h
发帖数: 2048
7
来自主题: Statistics版 - 请教一sas programmm
the easiest way;
proc sort nodupkey;by id sum;
A*********u
发帖数: 8976
8
来自主题: Statistics版 - 请教sas问题
proc sort data=indata out=uniqueid(keep=id) nodupkey;
by id;
run;
data shell;
set uniqueid;
do x=2002 to 2008;
output;
end;
run;
** sort indata and shell by id x;
data temp;
merge indata shell;
by id x;
retain IND_;
if first.id then do;
ind_=0;
end;
if y=0 then do;
IND=1;
ind_=1;
end;
if y=. then do;
if ind_=0 then ind=0;
if ind_=1 then ind=1;
end;
if y=1 then do;
ind=1;
ind_=0
A*********u
发帖数: 8976
9
来自主题: Statistics版 - SAS问题请教
这个你要先把数据处理一下
proc sort data=XXX out=XXX2 NODUPKEY;
by ID AA;
run;
这个是留下第一个
如果对留下哪一个还有要求,(比如说AA还有severity)
你要先sort
然后在下一个data step中用
if first.AA;
或者
if last.AA;

用proc freq怎么能让一个有一个以上AA=3的ID,不重复计次呢?
谢谢!
A*********u
发帖数: 8976
10
来自主题: Statistics版 - SAS问题请教
对了,才意识到你只要AA=3,那不一定要用proc freq
用proc freq会比较容易的的把所有的AA出现过的值都统计给你
如果只要3
可以这样(trt比较多的话可以用array)
proc sort data=xxx out=xxx2 nodupkey;
by id aa;
run;
data xxx3;
set xxx2 end=eof;
by id aa;
if aa=3 then do;
if trt=1 then n1+1;
if trt=2 then n2+1;
.
.
nt+1;
end;
if first.id then n+1;
if eof;
pct1=n1/n*100;
pct2=n2/n*100;
.
.
pctt=nt/n*100;
col1=put(n1,3.)||"("||put(pct1,3.)"%)";
..
..
run;

用proc freq怎么能让一个有一个以上AA=3的ID,不重复计次呢?
谢谢!
s*********h
发帖数: 16
11
来自主题: Statistics版 - 请帮忙看3道SAS题。
3。The following SAS program is submitted:
proc sort data=class out=class1 nodupkey;
by name course;
run;
Which SQL procedure program produces the same results?
(A) proc sql;
create table class1 as
select distinct name, course
from class;
quit;
(B) proc sql;
create table class1 as
select nodup name, course
from class;
quit;
(C) proc sql;
create table class1 as
select exclusive name, course
from class;
quit;
(D) proc sql;
create table class1 as
select name, course
from class
order by distinct name
g*******y
发帖数: 380
12
来自主题: Statistics版 - 请帮忙看3道SAS题。
一直没搞懂turotial里关于proc sql是否排序的问题.
不过这些选项里面好像除了A是不是都有语法错误啊?
我觉得这个题的关键是题目里有nodupkey.
q**j
发帖数: 10612
13
来自主题: Statistics版 - A SAS question
you are greedy, but greed is good :)
1. get unique group numbers into a table.
proc sort data = yourdata out=group(keep=group) nodupkey;
by group;
run;
2 do a lot of preparation.
data _NULL_;
set group end=lastobs;
name = cats("group",group);
call symput(cats("group",_n_),name);
if _n_ = 1 then
do;
longname= name;
longgroupvalue= '"'||cats(group)||'"';
end;
else
do;
longname=cats(longname)|| " "||cats(name);
longgroupvalue = cats(longgroupvalue)||", "||'"'||cats(group)||'"';
end;
if lastobs then
r********e
发帖数: 1686
14
来自主题: Statistics版 - nodupkey 删除的observations怎么查看?
rt,被问到这个,说是如果sort中用了这个option删除了这些但是manager又想知道哪
些被删除了,怎么做?
还有就是比较两个data set的区别除了用compare还有其他办法吗?
c*******o
发帖数: 8869
15
来自主题: Statistics版 - nodupkey 删除的observations怎么查看?
proc sort data=test; by XXXX; run;
data check;
set test;
by XXXX;
if first.xxxx = 0;
run;
S******y
发帖数: 1123
16
来自主题: Statistics版 - nodupkey 删除的observations怎么查看?
%macro cmpar_d2(lib1=,mem1=,lib2=,mem2=);
/* Get the obs in lib1.mem1 but not in lib2.mem2 */
/* and store OBS and VARS in ONEXTWO Dataset */
proc sql;
create table onextwo as
select *
from &lib1..&mem1
except corr all
select *
from &lib2..&mem2
;
quit;
/* Get the obs in lib2.mem2 but not in lib1.mem1 */
/* and store OBS and VARS in TWOXONE Dataset */
proc sql;
create table twoxone as
select *
from &lib2..&mem2
except corr all
g*****d
发帖数: 526
17
来自主题: Statistics版 - nodupkey 删除的observations怎么查看?
我倒。不是dupout么?
怎么搞出这么多来?
d*******1
发帖数: 854
18
来自主题: Statistics版 - nodupkey 删除的observations怎么查看?
靠, SAS的奇技淫巧真多
r********e
发帖数: 1686
19
来自主题: Statistics版 - nodupkey 删除的observations怎么查看?
哇,还有这么个statement啊,哎见识太少了。。。。。。
n***p
发帖数: 508
20
来自主题: Statistics版 - [提问]怎样sort这个dataset?
I do not do well in programming. I have a clusmy way, not sure if this is
what you need.
data Test;
input input $ outcome $ @@;
datalines;
A 0 A 0 A 0
A 1 A 1 A 1
A 2 A 2 A 2
B 0 B 0 B 0
B 1 B 1 B 1
B 2 B 2 B 2
;
run;
proc sort data = test out= test_sorted nodupkeys;
by input outcome;
run;
data a b;
set test_sorted;
if input = 'A' then output a;
else output b;
run;
data aaa;
set a a a;
run;
data bbb;
set b b b;
run;
data combine;
j******n
发帖数: 7
21
来自主题: Statistics版 - 问SAS adv 真题里的一题
To create a list of unique Customer_Id values from the customer data set,
which of the following techniques can be used?
technique 1: proc SORT with NODUPKEY and OUT=
technique 2: data step with IF FIRST.Customer_Id=1
technique 3: proc SQL with the SELECT DISTINCT statement
A.
only technique 1
B.
techniques 1 and 2
C.
techniques 1 and 3
D.
techniques 1, 2, or 3
前辈答案选的是D,不过他本人也不确定。我会选C。求教。谢谢
d********h
发帖数: 2048
22
来自主题: Statistics版 - 问个比较具体的算法问题
data tmp;
set group1;
x1=min(id1,id);
x2=max(id1,id);
proc sort nodupkey;by x1 x2;
proc print;var x1 x2
d********h
发帖数: 2048
23
来自主题: Statistics版 - 问个比较具体的算法问题
data tmp;
set group1;
id=id;
output;
id=id1;
output;
proc sort nodupkey;by id;
d********h
发帖数: 2048
24
来自主题: Statistics版 - 问个比较具体的算法问题
忘了min不能用字符串,
data tmp;
set group1;
if id1>id then do;
x1=id1;x2=id;end;
else do;
x1=id;x2=id1;
end;
proc sort nodupkey;by x1 x2;
proc print;var x1 x2
b******e
发帖数: 539
25
来自主题: Statistics版 - 请教一个transpose的问题,在线等
把楼上的程序稍微改了一下:
proc sort data=aaa out=bbb nodupkey; by A C B D; run;
proc transpose data=bbb out=fin prefix=D;
by A C;
ID B;
var D;
run;
R*********i
发帖数: 7643
26
来自主题: Statistics版 - sas question
Do you only want the firm names? It can be done in a few steps. I'm not a
good programmer to have all done in one step. :-)
proc sort data=test out=test1;
by firm year;
run;
*-- Method 1---*;
data test2;
set test1;
by firm year;
retain lastyr;
if first.firm then lastyr=year;
else do;
if lastyr+1=year then flag=1;
lastyr=year;
end;
run;
proc sort data=test2 out=tokeep (keep=firm) nodupkey;
by firm;
where flag;
run;
*-- Method 2---*;
proc transpose data = test1 out = ttes
l*****e
发帖数: 12
27
来自主题: Statistics版 - 如何找出没有duplicate的数字[done]
proc sort data=XXX out=nodup dupout=AAA nodupkey;
by var;
run;
j******o
发帖数: 127
28
这个方法是不是特别笨? 欢迎大家测试一下。假设事先知道给定字符的长度n。
%let n=20;
data have;
input str : $&n..;
datalines;
abcsdabcedbcsdaedfjs
;
run;
data have1;
set have;
do i=1 to &n;
do j=1 to &n-i;
sub=substr(str, i, j);
len=lengthn(sub);
output;
end;
output;
end;
run;
proc sort data=have1 nodupkey;
by i sub;
run;
proc sort data=have1;
by len sub;
run;
data have1;
set have1;
sub1=lag(sub);
if sub not eq sub1 then delete;
run;
data have1 (keep=str len);
set have1 end=las... 阅读全帖
o**********a
发帖数: 330
29
来自主题: Statistics版 - ADV SAS 63题中的第23题
Item 23 of 63 Mark item for review
Given the SAS data set SASUSER.HIGHWAY:
Steering Seatbelt Speed Status Count
-------- -------- ----- ------- -----
absent No 0-29 serious 31
absent No 0-29 not 1419
absent No 30-49 serious 191
absent no 30-49 not 2004
absent no 50+ serious 216
The following SAS program is submitted:
%macro SPLIT;
proc sort
data=SASU... 阅读全帖
s*********e
发帖数: 944
30
来自主题: Statistics版 - 请问Adv 63题 的第23题
options nodate nocenter formdlim='*';
data highway;
input Steering $ Seatbelt $ Speed $ Status $ Count;
cards;
absent No 0-29 serious 31
absent No 0-29 not 1419
absent No 30-49 serious 191
absent no 30-49 not 2004
absent no 50+ serious 216
run;
%macro split;
proc sort data=highway out=work.uniques(keep=status) nodupkey;
by status;
run;
data _null_;
set uniques end=lastobs... 阅读全帖
p***r
发帖数: 920
31
来自主题: Statistics版 - 请教大家 这个SAS小程序怎么编
try my version, without transpose
DATA ONE;
INPUT VAR $ @@;
CARDS;
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14
;
data two;
set one;
id= int((_n_-1)/5);
length v1-v5 $10.;
if mod(_n_,5)=1 then do;
v1='';
v2='';
v3='';
v4='';
v5='';
end;
if mod(_n_,5)=1 then v1=var;
if mod(_n_,5)=2 then v2=var;
if mod(_n_,5)=3 then v3=var;
if mod(_n_,5)=4 then v4=var;
if mod(_n_,5)=0 then v5=var;
if mod(_n_,5) ne 0 then
do;
retain v1 v2 v3 v4 v5;
end;
drop var;
run;
proc sort data=two ;
by id descending v... 阅读全帖
D******n
发帖数: 2836
32
来自主题: Statistics版 - 新手问个问题 (转载)
create a .vim directory under you home directory(there is a dot before
vim)
and then create a syntax directory under it
and then create a sas.vim file under the syntax directory
==============sas.vim======================
if version < 600
syntax clear
elseif exists("b:current_syntax")
finish
endif
syn case ignore
syn region sasString start=+"+ skip=+\\|\"+ end=+"+
syn region sasString start=+'+ skip=+\\|\"+ end=+'+
" Want region from 'cards;' to ';' to be captured (Bob Heckel)
sy... 阅读全帖
h********o
发帖数: 103
33
来自主题: Statistics版 - 问个sas问题(包子)
Try this:
====================================
PROC SQL;
CREATE TABLE FINAL AS
SELECT A.ID,
A.EVENT_DATE,
B.ADMISSION_DATE
FROM FILE1 AS A LEFT JOIN FILE2 AS B
ON A.ID = B.ID AND A.EVENT_DATE >= B.ADMISSION_DATE
ORDER BY ID, EVENT_DATE, ADMISSION_DATE DESC;
QUIT;
PROC SORT DATA = FINAL NODUPKEY;
BY ID EVENT_DATE;
RUN;
n*****n
发帖数: 3123
34
来自主题: Statistics版 - 发包子求助SAS code
proc sort data=original out=date_index nodupkey;
by date;
run;
data date_index;
set date_index;
index = _N_;
run;
proc sort data=original;
by date;
run;
data final;
merge original date_index;
by date;
drop date;
run;
j******o
发帖数: 127
35
来自主题: Statistics版 - SAS code求教
Please try and to see if the three groups of id are what you want.
proc sort data=have out=have1 nodupkey; by id typle; run;
data _equ _sup _both;
set have1;
by id typle;
if first.id=last.id and upcase(trim(typle))='EQUIPMENT' then output
_equ(keep=id);
else if first.id=last.id and upcase(trim(typle))='SUPPLY' then output
_sup(keep=id);
else output _both(keep=id);
run;
p******d
发帖数: 1120
36
来自主题: Statistics版 - ADV 63题中的45题
Item 45 of 63 Mark item for review
To create a list of unique Customer_Id
values from the customer data set, which
of the following techniques can be used?
technique 1: proc SORT with NODUPKEY and OUT=
technique 2: data step with IF FIRST.Customer_Id=1
technique 3: proc SQL with the SELECT DISTINCT statement
A.
only technique 1
B.
techniques 1 and 2
C.
techniques 1 and 3
D.
techniques 1, 2, or 3
答案是D。我怎么觉得是C。题里没说这个数据是Sorted,这样的话 ... 阅读全帖
k*******a
发帖数: 772
37
来自主题: Statistics版 - sas问题
可以先 proc sort
by A B C
用 nodupkey来删掉重复的C值
然后再 proc freq

数。
x***x
发帖数: 3401
38
来自主题: Statistics版 - sas question ( you bao zi )
proc sort data=a nodupkey;
by acct_id;
This will eliminate any duplicated value of acct_id.
p********r
发帖数: 1465
39
来自主题: Statistics版 - sas question ( you bao zi )
proc sort... nodupkey out= dupout=;
by accountid;
run;
out= 输出的是不重复的,dupout输出的是所有重复过的。
用compress,之前的童鞋说过了。
million级别的数据也不用太久,很快能出结果。
g****8
发帖数: 2828
40
来自主题: Statistics版 - 关于proc sql left join的一个问题
不知道下面这个行不行,没有test过。而且如果你的b里面variable多的话,keep那里
要写很多的话,就没有sql的方法efficient了。
如果a里面variable少,改成drop也行。
proc sort data=a; by id provider;run;
proc sort data=b; by id provider;run;
DATA test;
Merge a (in=t1) b(in=t2);
by id provider;
if (t1=1 and t2=0 ) then delete;
if( t1=1 and t2=1 and begdate<=admsn_dt and enddate>=dschrgdt ) then
delete;
keep ;
run;
proc sort data=test nodupkey;by ***; run;
s********1
发帖数: 54
41
来自主题: Statistics版 - One question about %put
In the following code, I want to use "%put &status1" to show &status1=not.
But I did not see it in SAS log. Who can tell me why?
#######################################
data HIGHWAY;
input Steering $ Seatbelt $ Speed $ Status $ Count;
cards;
absent No 0-29 serious 31
absent No 0-29 not 1419
absent No 30-49 serious 191
absent no 30-49 not 2004
absent no 50+ serious 216
;
run;
%macro SPLIT;
proc sort data=HIGHWAY out=WORK.UNIQUES(keep=Status) nodupkey;
by Status;
run;
data fer;
set uniques end=Las... 阅读全帖
w*******n
发帖数: 469
42
来自主题: Statistics版 - 请教SAS中如何如果flag的问题
proc sort nodupkey dupout=tmp;by v1; run;
data test;
merge test tmp(keep=v1 in=tmp); by v1;
if tmp then flag=0; else flag=1;
run;
j******o
发帖数: 127
43
来自主题: Statistics版 - 求SAS code,有包子
先对所有ID生成一个假的但week不缺的data,再merge by ID WEEK, 缺失weight的week
就是你要的时间。
proc sort data=patient out=two(keep=id) nodupkey; by id; run;
data _tem;
set two;
do week=1 to 10;
output;
end;
run;
s******8
发帖数: 102
44
来自主题: Statistics版 - 请问关于交易量的一个SAS编程问题
我也试一下:
你的问题是数据太大,而又必须排序.所以在排序方法上着手. 若你知道日期跨度,第一
步安天拆分数据,然后对每天排序并检查,最后把结果合并起来.
假如最早date as macro variable Day1, last date as macro variable day2;
%let date1=mdy(1,1,1990);
%let date2=mdy(12,31,2012);
%macro trybest(day1=&date1,day2=&date2);
data %do i=&day1 %to &day2;dt_&i %end;;
set yourdate;
select(date);
%do i=&day1 %to &day2;
when(i) output dt_&i;
%end;
otherwise put "ERROR: other date found " date;
end;
drop date;
run;
%do i=&day1 %to &day2;
%let dsid=%sysfunc(open(dt_&I,i));
%let nobs=%sysf... 阅读全帖
D*********Y
发帖数: 3382
45
来自主题: Statistics版 - 关于SAS interview
Original question:
Using SAS, combine Dataset A with Dataset B. Keep only those records that
are contained in both A & B.
comment: merge or sql is not impressive at all.
The format approach definitely shows a candidate that’s been around the
block a few times.
/* Proc Format */
data b; set b;
start = ordernumber;
label = '*';
fmtname = '$key';
run;
proc sort data=b nodupkey; by start;
run;
proc format cntlin=b; run;
data all; set a;
if put(ordernumber,$key.) = '*';
run;
S*******1
发帖数: 251
46
来自主题: Statistics版 - help!! help!! SAS help!! Urgent!!
Then you need sort the data out with nodupkey to remove the duplicated Var1
and var2. then used the same idea to count the level of var3 under the group
of var1 and var 2.
y**i
发帖数: 1050
47
来自主题: Statistics版 - help!! help!! SAS help!! Urgent!!
proc sort data nodupkey;
by var1 var2;
run;
proc sql NUMBER;
select VAR1,VAR2 , COUNT (DISTINCT VAR3) AS LEVEL
FROM ONE
GROUP BY VAR1 VAR2
order by VAR1, VAR2, VAR3;
QUIT;
is this ok?
Can I do this in data step , not in proc sql?

Var1
group
a*********7
发帖数: 2
48
来自主题: Statistics版 - SAS ADV 63题中第45题
To create a list of unique Customer_Id
values from the customer data set, which
of the following techniques can be used?
technique 1: proc SORT with NODUPKEY and OUT=
technique 2: data step with IF FIRST.Customer_Id=1
technique 3: proc SQL with the SELECT DISTINCT statement
A.only technique 1
B.techniques 1 and 2
C.techniques 1 and 3
D.techniques 1, 2, or 3
跟Crackman学sas里答案为C,说第二个方法需要实现对FIRST变量SORT, 但下载的答案
选D。我觉得也是D, 因为题目问那种方法,具体sort不sort是操作的具体步骤。大牛们
给个答案
k*******a
发帖数: 772
49
来自主题: Statistics版 - SAS怎么within group 进行data step
这个sql做起来比较简单些
data step的话可以:
proc sort data=a(where = (DISP ne "PRIMARY")) out=b(keep=ID) nodupkey;
by ID;
run;
data c;
merge a(in=in1) b(in=in2);
by ID;
if not in2;
run;
1 (共1页)