由买买提看人间百态

topics

全部话题 - 话题: sampsize
(共0页)
f*****k
发帖数: 110
1
In the book for SAS adv prep is an example on Creating a Random Sample
without Replacement on P459. Who can help me to understand the logic behind
the method. Or any other better methods? Thanks a lot.
The example and codes are posted below:
Example
You can use a DO WHILE loop to avoid replacement as you create your random
sample. In the following example
# Sasuser.Revenue is the original data set.
# sampsize is the number of observations to read into the sample.
# Work.Rsubset is the data set t... 阅读全帖
D*D
发帖数: 236
2
来自主题: Statistics版 - SAS sampling的问题
for large dataset and exact number of samples the following is an example
from the SAS advanced certificate guide of the fastest algorithm to serve
that purpose
much faster than proc surveyselct
data work.rsubset(drop=obsleft sampsize);
sampsize=100;
obsleft=totobs;
do while(sampsize>0);
pickit+1;
if ranuni(0) set sasuser.revenue point=pickit
nobs=totobs;
output;
sampsize=sampsize-1;
end;
obsleft=obsleft... 阅读全帖
a*z
发帖数: 294
3
来自主题: Statistics版 - 请教SAS random sample的问题
在adv pre 中读到:
data work.rsubset(drop=obsleft sampsize);
sampsize=10;
obsleft=totobs;
do while(sampsize>0);
pickit+1;
if ranuni(0) set sasuser.revenue point=pickit
nobs=totobs;
output;
sampsize=sampsize-1;
end;
obsleft=obsleft-1;
end;
stop;
run;
教材说是random sample without replacement.我怎么觉得只是读了dataset中的前10
个sample?因为picki... 阅读全帖
C*********u
发帖数: 811
j*****e
发帖数: 182
5
Suppose your data has 1000 observation. You want to draw a sample of 100.Use
the following SAS code,
proc surveyselect data=dataname method=urs SAMPSIZE=100 rep=1 out=sample
seed=1594 outhits;
run;
There are other methods to randomly select observations (w or w/o
replacement, stratified sampling, clustered sampling, PPS sampling, etc).
Read SAS help for more detail.
o****o
发帖数: 8077
6
你这个用BY statement在SAS里面也不难,分条件运行不同回归也很容易。比如
data original;
array _x{*} var1-var10;
do i=1 to 10000;
do _j=1 to dim(_x); _x[_j]=rannor(7655)+sin(i + _j); end;
output;
drop i _j;
end;
run;
ods select none;
proc surveyselect data=original out=samp rep=10
sampsize=1000 method=srs;
run;
proc means data=samp;
by replicate;
var var1;
output out=_mean mean(var1)=mean1;
run;
data samp
j*****e
发帖数: 182
7
This will do.
proc surveyselect data=set
method=srs sampsize=2
rep=5000
seed=40070 out=SampleRep;
run;
You can also use Proc multtest.
l****u
发帖数: 199
8
PROC SURVEYSELECT DATA=pop OUT=sample METHOD=SRS
SAMPSIZE=100 SEED=1234567;
RUN;
it is very basic thing, and you may spend a little time to learn it for
right using.
o****o
发帖数: 8077
9
来自主题: Statistics版 - R的循环语句该怎么用。
for this particular question, SAS actually is pretty handy:
*******************************;
data test;
do id=1 to 20000;
x=rannor(0);
output;
end;
run;
ods select none;
proc surveyselect data=test out=samp method=srs
seed=93759437 sampsize=1000 rep=100 ;
run;
ods select all;
proc univariate data=samp noprint;
by replicate;
histogram x/outhistogram=hist outkernel=kernel noplot;
run;
proc sgplot data=hist;
series x=_MIDPT_ y=_OBSPCT_ /group=replica... 阅读全帖
c********g
发帖数: 193
10
来自主题: Statistics版 - 如何用SAS 生成一个组合变量?
谢谢
不过还是没有弄太清楚,能再说说吗?
DATA TEMP1;
input x $1;
cards;
A
.
.
Z
;
run;
proc surveyselect data=temp1
methods=urs sampsize=26 rep=8
out=temp2;
run;
c***z
发帖数: 6348
11
来自主题: Statistics版 - Random forests on imbalanced data (转载)
【 以下文字转载自 DataSciences 讨论区 】
发信人: chaoz (面朝大海,吃碗凉皮), 信区: DataSciences
标 题: Random forests on imbalanced data
发信站: BBS 未名空间站 (Fri Jun 20 12:54:36 2014, 美东)
Recently I used RF for imbalanced data (10% positive, 90% negative) and I
played with several tricks. Below are the comparison of results. We are most
concerned about false negatives.
Any comments and suggestions are extremely welcome!
1. vanilla version:
> randomForest(Relevant ~ ., data = train, ntree = 1000)
# prediction_1a FALSE TRUE
# a... 阅读全帖
c***z
发帖数: 6348
12
来自主题: DataSciences版 - Random forests on imbalanced data
Recently I used RF for imbalanced data (10% positive, 90% negative) and I
played with several tricks. Below are the comparison of results. We are most
concerned about false negatives.
Any comments and suggestions are extremely welcome!
1. vanilla version:
> randomForest(Relevant ~ ., data = train, ntree = 1000)
# prediction_1a FALSE TRUE
# actual
# FALSE 22667 83
# TRUE 523 1723
acc = 0.9757561
2. lower threshold (predict TRUE if pro... 阅读全帖
(共0页)