关于sampsize的讨论汇总 - 话题女王

全部话题 - 话题: sampsize

f*****k
发帖数: 110

来自主题: Statistics版 - Help on understanding how to Creating a Random Sample without Replacement

In the book for SAS adv prep is an example on Creating a Random Sample
without Replacement on P459. Who can help me to understand the logic behind
the method. Or any other better methods? Thanks a lot.
The example and codes are posted below:
Example
You can use a DO WHILE loop to avoid replacement as you create your random
sample. In the following example
# Sasuser.Revenue is the original data set.
# sampsize is the number of observations to read into the sample.
# Work.Rsubset is the data set t... 阅读全帖

D*D
发帖数: 236

来自主题: Statistics版 - SAS sampling的问题

for large dataset and exact number of samples the following is an example
from the SAS advanced certificate guide of the fastest algorithm to serve
that purpose
much faster than proc surveyselct
data work.rsubset(drop=obsleft sampsize);
sampsize=100;
obsleft=totobs;
do while(sampsize>0);
pickit+1;
if ranuni(0) set sasuser.revenue point=pickit
nobs=totobs;
output;
sampsize=sampsize-1;
end;
obsleft=obsleft... 阅读全帖

a*z
发帖数: 294

来自主题: Statistics版 - 请教SAS random sample的问题

在adv pre 中读到：
data work.rsubset(drop=obsleft sampsize);
sampsize=10;
obsleft=totobs;
do while(sampsize>0);
pickit+1;
if ranuni(0) set sasuser.revenue point=pickit
nobs=totobs;
output;
sampsize=sampsize-1;
end;
obsleft=obsleft-1;
end;
stop;
run;
教材说是random sample without replacement.我怎么觉得只是读了dataset中的前10
个sample?因为picki... 阅读全帖

C*********u
发帖数: 811

来自主题: Biology版 - 做一个test，sample size如何计算

http://www.epibiostat.ucsf.edu/biostat/sampsize.html#ttest

j*****e
发帖数: 182

来自主题: Statistics版 - [合集] how to randomly draw 10% sample from a data set?

Suppose your data has 1000 observation. You want to draw a sample of 100.Use
the following SAS code,
proc surveyselect data=dataname method=urs SAMPSIZE=100 rep=1 out=sample
seed=1594 outhits;
run;
There are other methods to randomly select observations (w or w/o
replacement, stratified sampling, clustered sampling, PPS sampling, etc).
Read SAS help for more detail.

o****o
发帖数: 8077

来自主题: Statistics版 - 如何在1，2，3，4，5中随机选出2个数来？

你这个用BY statement在SAS里面也不难，分条件运行不同回归也很容易。比如
data original;
array _x{*} var1-var10;
do i=1 to 10000;
do _j=1 to dim(_x); _x[_j]=rannor(7655)+sin(i + _j); end;
output;
drop i _j;
end;
run;
ods select none;
proc surveyselect data=original out=samp rep=10
sampsize=1000 method=srs;
run;
proc means data=samp;
by replicate;
var var1;
output out=_mean mean(var1)=mean1;
run;
data samp

j*****e
发帖数: 182

来自主题: Statistics版 - 如何在1，2，3，4，5中随机选出2个数来？

This will do.
proc surveyselect data=set
method=srs sampsize=2
rep=5000
seed=40070 out=SampleRep;
run;
You can also use Proc multtest.

l****u
发帖数: 199

来自主题: Statistics版 - In sas, how do you randomly pick 10 numbers out of 29?

PROC SURVEYSELECT DATA=pop OUT=sample METHOD=SRS
SAMPSIZE=100 SEED=1234567;
RUN;
it is very basic thing, and you may spend a little time to learn it for
right using.

o****o
发帖数: 8077

来自主题: Statistics版 - R的循环语句该怎么用。

for this particular question, SAS actually is pretty handy:
*******************************;
data test;
do id=1 to 20000;
x=rannor(0);
output;
end;
run;
ods select none;
proc surveyselect data=test out=samp method=srs
seed=93759437 sampsize=1000 rep=100 ;
run;
ods select all;
proc univariate data=samp noprint;
by replicate;
histogram x/outhistogram=hist outkernel=kernel noplot;
run;
proc sgplot data=hist;
series x=_MIDPT_ y=_OBSPCT_ /group=replica... 阅读全帖

c********g
发帖数: 193

来自主题: Statistics版 - 如何用SAS 生成一个组合变量？

谢谢
不过还是没有弄太清楚，能再说说吗？
DATA TEMP1;
input x $1;
cards;
A
.
.
Z
;
run;
proc surveyselect data=temp1
methods=urs sampsize=26 rep=8
out=temp2;
run;

c***z
发帖数: 6348

来自主题: Statistics版 - Random forests on imbalanced data (转载)

【以下文字转载自 DataSciences 讨论区】
发信人: chaoz (面朝大海，吃碗凉皮), 信区: DataSciences
标题: Random forests on imbalanced data
发信站: BBS 未名空间站 (Fri Jun 20 12:54:36 2014, 美东)
Recently I used RF for imbalanced data (10% positive, 90% negative) and I
played with several tricks. Below are the comparison of results. We are most
concerned about false negatives.
Any comments and suggestions are extremely welcome!
1. vanilla version:
> randomForest(Relevant ~ ., data = train, ntree = 1000)
# prediction_1a FALSE TRUE
# a... 阅读全帖

c***z
发帖数: 6348

来自主题: DataSciences版 - Random forests on imbalanced data

Recently I used RF for imbalanced data (10% positive, 90% negative) and I
played with several tricks. Below are the comparison of results. We are most
concerned about false negatives.
Any comments and suggestions are extremely welcome!
1. vanilla version:
> randomForest(Relevant ~ ., data = train, ntree = 1000)
# prediction_1a FALSE TRUE
# actual
# FALSE 22667 83
# TRUE 523 1723
acc = 0.9757561
2. lower threshold (predict TRUE if pro... 阅读全帖

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天