t*******8 发帖数: 170 | 1 听说这里有很多大牛,所以来碰碰运气哈。
遇到一个难题了:
上次没有说清楚,表达能力太差,这次再说一遍:
其实是有120条序列,每条序列有100个值,每个值的取值是20种可能性的一种(如:A,B,。。。,O)。问题是:如何确定序列的第3个位置和第6个位置的之间的关系?我想过用covariance,不过那把这些A,B,。。都量化了,本身它们只是不同而已,就是catergory,没有量化关系,所以结果不对。有什么方法能够确定第3和第6位置的两列数据的关系呢?
position 1 2 3 4 5 6 7 8
obs
1 A D G A E F H A
2 F C G N L N H O
3 D D I J K F M A
.
.
.
120
这些取值都是20种可能性的一种,这和它们的distribution的关系,我还没有想清楚。
谢谢先! |
|
A********a 发帖数: 133 | 2 U need reduce the dimension of the problems, first because stock returns are
likely driven by a few factors, second because there are missing obs for
such data sets. For the first, u need a factor model (Barra, Northfield, APT
, etc), for the latter, u need some robust methods to accommodate missing
observations, e.g., Em algo, Stambaugh's method and others.
To make the covariance matrix spf is easy, do the eigen-decomp, and ... |
|
A********a 发帖数: 133 | 3 sample covariance is too noisy.
Among 1000 stocks, u may have some with less than 1000 obs, better use
factor based method, check out BARRA document, if u want fancy, go RMT. |
|
J**Y 发帖数: 34 | 4 Suppose you want to draw from mutivariate normal distribution with mean
vector U (kX1)and variance-covariance matrix V(kxk). First, draw k sequences
independently from standard normal distribution; Second, let T be the square
root of V such that T'T=V; Third, stacking the k draws together to get a
matrix Z (nxk), where n is the number of obs of one sequence. Finally,
the desired draw is: X=U+T*Z. |
|
p********a 发帖数: 5352 | 5 ☆─────────────────────────────────────☆
Bighappy (快乐大大大) 于 (Fri Apr 6 14:47:46 2007) 提到:
大侠帮忙看看下面的程序:
data dataA;
input var1 var2;
datalines;
1.2 2.0
4.2 3.2
3.8 1.8
6.0 9.3
7.5 5.4
8.6 7.2
;;;;
data dataB;
input var3 var4;
datalines;
2.1 2.4
2.3 7.2
3.4 2.8
9.4 5.5
5.5 5.9
7.8 7.2
1.1 1.4
4.5 6.5
;;;;
data new;
merge dataA dataB;
run;
proc print; run;
产出的结果如下:
Obs var1 var2 var3 var4
1 1.2 2.0 2.1 |
|
p********a 发帖数: 5352 | 6 ☆─────────────────────────────────────☆
yalier (丫梨儿) 于 (Sat Dec 1 12:30:37 2007) 提到:
各位大侠好,我现在做一个PROJECT,有一个DATASET里面一个VARIABLE叫作RESULT
,是用来记录MEASURE OF EFFICACY的,
36
39
40
38
32
。
。
现在想对这个变量做LOCF分析,并且每个PATIENT有一个OBSERVATION,DATASET里还有一个
变量是PATIENT ID。请问如何实现呢?我觉得该用MERGE,RETAIN,但是具体不知怎么做。
我是第一次做这种分析,SAS初学者。麻烦大家不吝赐教,多谢了!!
☆─────────────────────────────────────☆
tosi (我的名字叫/tu'zi:/) 于 (Sat Dec 1 15:15:16 2007) 提到:
What is LOCF? I mean, do you want to sum up within each patient or what? |
|
V******n 发帖数: 881 | 7 How to do extreme quantile estimations based on historical data? no need to
make assumptions on data generation etc, what about high quantiles given one
or many replications? Say we have 100 obs in every sample, can we give the
99 per centile of the parental distribution? What estimator should be used
and what about the error distribution of the estimate? Thank you! |
|
z*****w 发帖数: 118 | 8 Hi, friends,
I have a dataset like this,
1 0
2 0
3 0
4 0
5 150
6 150
7 150
8 0
9 0
10 0
11 0
12 0
13 0
14 150
15 150
16 0
I want to pick every first 150 obs. in each of the 150s groups, such as No.
5 and 14. How can I do that? Thank you very very much . |
|
z*********o 发帖数: 541 | 9 但是在sas里run 了一下,结果是
Obs name rate year captial
1 firstcap 0.0718 4 15000
2 directba 0.0721 4 30000
3 virtuald 0.0728 4 45000
不知道错在哪里呢?
there |
|
S***e 发帖数: 108 | 10 people和money即使完全一样的dataset,用这个语句执行之后也是0 obs,可能是因为
set,但是不是完全明白为什么,请哪位高手解释一下? |
|
A*********u 发帖数: 8976 | 11 dataset A
a
1
2
3
dataset B
a
4
5
6
code:
data AB;
set A(in=in1) B(in=in2);
run;
you will get:
dataset AB
a in1 in2
1 1 0
2 1 0
3 1 0
4 0 1
5 0 1
4 0 1
so there is no "in1 and in2".
BTW in1 and in2 are internal variables stored in PDV,
but won't be written to dataset AB.
people和money即使完全一样的dataset,用这个语句执行之后也是0 obs,可能是因为
set,但是不是完全明白为什么,请哪位高手解释一下? |
|
s*r 发帖数: 2757 | 12 probably longitudinal obs of blood sugear
random |
|
x******m 发帖数: 736 | 13 I don't think it is longitudinal. A longitudinal study is a correlational
research study that involves repeated observations of the same items over
long periods of time. Here is not repeated obs. of the same items over long
time. |
|
l****g 发帖数: 304 | 14 Why answer is B not C, thanks!
111. A SAS PRINT procedure output of the WORK.LEVELS data set is listed
below:
Obs name level
1 Frank 1
2 Joan 2
3 Sui 2
4 Jose 3
5 Burt 4
6 Kelly .
7 Juan 1
The following SAS program is submitted:
data work.expertise;
set work.levels;
if level = . then
expertise = 'Unknown';
else if level = 1 then
expertise = 'Low';
else if level = 2 or 3 then
expertise = 'Medium';
else
expertise = 'High';
run;
Which of the following values does the variable EXPERTISE contain?
A. |
|
d*****w 发帖数: 684 | 15 刚才我在学习PROC SQL的时候,发现参数distinct可以只保留下某个variable不重复的
变量。那么要是我不用proc sql,而只是使用普通的数据处理的话,应该怎么写? 能在
data step里面搞定吗?
谢谢! |
|
w********5 发帖数: 72 | 16 proc sort data=a out=new nodup;
run;
data new has no duplicate data (It is one of my interview question) |
|
d*****w 发帖数: 684 | 17 太感谢了,看来自己还是用的不多,你一说,我想起来这个好像同时也是一个sas base
的题目。 |
|
c*******o 发帖数: 3829 | 18 select distinct a.*
from a |
|
c**********e 发帖数: 2007 | 19 Do not know. SAS Learning Edition costs $200. Maximum number
of obs = 1500. |
|
m******n 发帖数: 462 | 20 It works pretty well. Good to know coalesce function. Thanks.
Still wonder why sas only overwrites the first obs of each stratum when two
datasets merged by stratum and have other viables in common other than the
BY variable. |
|
s**c 发帖数: 1247 | 21 create 1000 uniform random samples, one for each obs
sort by the random sample, select the first 200 |
|
l****a 发帖数: 336 | 22 数据保存在 111.txt 文件中
ASL32,SMITH AND SONS,1998
LAS484,MAJOR UNIVERSITY PRESS,1989
IOD859,SMITH AND SONS,1988
REU701,TOWN PRESS,1995
WRE142,LITTLE FEET,1990
我用的代码:
data publish;
infile '111.txt' dlm=',';
input BookID $ Publisher & $ Year;
run;
proc print data=publish;
run;
output完全错误 郁闷
Obs BookID Publisher Year
1 ASL32 SMITH AN .
2 IOD859 SMITH AN . |
|
z****h 发帖数: 203 | 23 其实我的意思就是如何用SAS对ID(OBS)求SUM或者用SAS消去ID(没有ID)。 |
|
z****h 发帖数: 203 | 24 最后结果只显示这样。
70000 PARTS
65000 SALES
45000 SALES
45000 SALES
35000 SALES
31000 PARTS
30000 PARTS
25000 SALES
21000 PARTS
20000 PARTS
==========
387000 |
|
w***z 发帖数: 28 | 25 noobs?
(proc print data=a noobs;....) |
|
v**s 发帖数: 44 | 26 比如有一些 obs
aab123
aab567
aac456
aab876
aab456
aac890
我现在只想keep 开头是aab的这些,该怎么写我的code亚?
多谢多谢!!!!! |
|
x**********n 发帖数: 13 | 27 thanks,如果data复杂一点
data test;
input id date nnb sumnb;
datalines;
1 10 2 15
1 11 1 15
1 12 1 20
1 13 1 10
2 7 1 150
2 9 1 200
2 10 2 150
2 17 1 300
;
run;
我这么写的,可是output不太对,为什么obs2也是.?
data new;
set test;
by id;
if first.id then diff=.;
else diff=sumnb-lag(sumnb);
run;
Obs id date nnb sumnb diff
1 1 10 2 15 |
|
x**********n 发帖数: 13 | 28 thanks,如果data复杂一点
data test;
input id date nnb sumnb;
datalines;
1 10 2 15
1 11 1 15
1 12 1 20
1 13 1 10
2 7 1 150
2 9 1 200
2 10 2 150
2 17 1 300
;
run;
我这么写的,可是output不太对,为什么obs2也是.?
data new;
set test;
by id;
if first.id then diff=.;
else diff=sumnb-lag(sumnb);
run;
Obs id date nnb sumnb diff
1 1 10 2 15 |
|
C*********y 发帖数: 1424 | 29 原数据是
5 90 80 70 77 88 23
2 100 99 25
3 87 85 88 35
第一组数是考试的次数,其他的是考试分数,最后一个数是年龄
想出的效果是这样的
Obs SCORE1 SCORE2 SCORE3 SCORE4 SCORE5 NUMBER SUBJ AGE
1 90 80 70 77 88 5 1 23
2 100 99 . . . 2 2 25
3 87 85 88 . . 3 3 35
必须用array做
应该如何写这个code呢
我写的
Data scoreproblem;
array array_score[5] score1-score5;
if array_score[i]<60 then array_score[i]=" "@;
input N score1 score2 score3 score4 score5 age;
datalines;
5 |
|
A*********u 发帖数: 8976 | 30 不行
首先lag记录的是上一次你用lag的时候那个var_1的值
不一定是上一个obs里var_1的值
其次,这段code不能解决有多个连续missing的情况
这样
data locf;
set olddata;
retain lastnmis; ** means last no-missing value;
if var_1>.z then lastnmis=var_1;
else var_1=lastnmis;
run;
try the lag function:
data mydata;
set mydata;
if var_1 = . then var_1 = lag1( var_1 );
run; |
|
d*****n 发帖数: 65 | 31 sql效率最低了。
能用data步的话,我觉得肯定不要选择sql。
而且貌似sql有时候会丢一些obs,很奇怪。 |
|
c******j 发帖数: 270 | 32 dataset mydata contains 3 observations.
If I run this code:
data new;
do i=1 to 3;
set mydata;
end;
run;
why there is only 1 obs in the data new? |
|
c******j 发帖数: 270 | 33 It's a SAS base practice question........
It's not that I want to do anything....
I want to know why the "set mydata" statement run three times but only 1 obs
is output. |
|
A*********u 发帖数: 8976 | 34 set statement only read one observation into pdv once a time
it won't write the observation to the new dataset unless sas
meets an output statement, or reaches the end of the data step
(the output statement will overwrite the default output at the end)
It's a SAS base practice question........
It's not that I want to do anything....
I want to know why the "set mydata" statement run three times but only 1 obs
is output. |
|
z*********o 发帖数: 541 | 35
前面的部分是work的,之所以都贴出来,使因为后面的要用上前面的数据。主要的后面
那个求
probability,你能看到结果吗?结果应该是这样的,但是我得不到这个结果。总有错
Choice of Chocolate Candies
Obs Dark Soft Nuts p
1 Dark Chewy Nuts 0.50400
2 Dark Chewy No Nuts 0.21600
3 Milk Chewy Nuts 0.12600
4 Dark Soft Nuts 0.05600
5 Milk Chewy No Nuts 0.05400
6 Dark Soft No Nuts 0.02400
7 Milk Soft Nuts 0.01400
8 Milk Soft No Nuts 0.00600 |
|
p********a 发帖数: 5352 | 36 俺一般查2样
1. 500多个变量名都一致
先COPY PASTE PDF文件里的VARIABLE NAMES,读入DATASET a,然后再proc contents
data=urdata out=b;run;然后比较a和b 是否一致。当然,如果需要做细致比较,可以
读入FORMAT LENGTH等等,一一比较。我是觉得一般没那个必要
2. 2个PROC CONTENTS OUTPUT最开始一段都一致(变量数,OBS数,INDEX ETC) |
|
s*******2 发帖数: 791 | 37 1. b. catx('-',x,y,z)
7. Obs x y z
1 1 a 2
2 1 b 4
3 2 a 3
4 2 b 5
5 3 a 4
6 4 a 6
7 4 b 7
|
|
l*******l 发帖数: 204 | 38 However I would not think all obs in the data are independent. I am not sure
if there is non-parameter method for the correlated data. I would like to
know as well |
|
n****c 发帖数: 78 | 39 The scored between left and right knee are correlated within the same
subject, so two sample t-test is no appropriate. But my sample is so small 8
patients, 32 obs. Parametric method is ok? Thanks |
|
z********n 发帖数: 710 | 40 想问问在proc surveyreg or surveylogistic 里面用cluster来控制了同一个cluster
里面的obs 之间的相关性,还需要用multilevel analysis吗?比如同一个公司的同一
个学校,
而这个学校或者公司就是cluster需要的variable.
非常感谢~ |
|
w**********n 发帖数: 5 | 41 做多元回归,共有4.50个自变量,500多个obs,但自变量之间有很强的共线性。用
elastic net方法做model selection,试了试一下子模型里有很多变量。要怎样可以筛
选这些变量呢?BIC value怎么算?相关R的命令怎么写? |
|
D******n 发帖数: 2836 | 42 moving average?
but ur data is already moving average, maybe just plot the obs. |
|
f*********8 发帖数: 165 | 43 我有两组数据:
data1有两个variable:location, Pvalue (900 obs)
data2有两个variable: start, end (10 observations)
我想比较location是否在start和end之间,如果是的话data就merge到一起,如果多个
location在同一个start-end区间,这些location的Pvalue就用来算Mean Pvalue.其他
不在任何start-end区间的location就删除。
请问这个应该怎末coding?多谢了。 |
|
o******6 发帖数: 538 | 44 ☆─────────────────────────────────────☆
toobz (bzbutlaz) 于 (Tue Feb 12 14:14:48 2008) 提到:
大家好,我今天写了一个简单的程序:
Data date;
input date MMDDYY11.;
cards;
10=9=2005
6\02\23
6/2/23
6.2.23
6 02/23
6\02/23
11/25/2/8
6/2/23
11.25.2.215
6/2/23/A
6/2/23
6/2/23/11
;
run;
proc print;
run;
我得出的output是这个:不明白负数是什么意思啊?
Obs date
1 16718
2 -13362
|
|
|
s*******2 发帖数: 791 | 46 输出到 output window / 赋值给一个变量,然后用proc print输出到output window |
|
P****D 发帖数: 11146 | 47 Not sure that I follow you... But try ODS TRACE your PROC CONTENTS and
figure out the outputable dataset you need. |
|
s*********e 发帖数: 1051 | 48 ods output position = output;
proc contents data = yourdata varnum;
run; |
|
s******d 发帖数: 2730 | 49 I know matplot works. But if you have different numbers of obs from each
state and different x levels for each state, that could be a little tricky.
Still doable.
For your question. See the following example.
temp<-sample(c(rep("NY",3), rep("NC", 2), rep("AL", 4)))
temp
temp<-as.factor(temp)
temp
temp<-as.numeric(temp)
temp
character |
|
t****t 发帖数: 106 | 50 我觉得对于每个observation,运行do year=...的时候每次都有output,所以每次产生
五个obs,共25个。可以run一下试试 |
|