从一个大file, 一遍读取输出多个小 file - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 从一个大file, 一遍读取输出多个小 file

相关主题
● 请高手帮我看看：glimmix model 有更好的建议吗？	● SAS 求助: filenames
● SAS file question,thanks very mcuh	● R 中如何读取zip file中的文件
● Proc SQL 能不能直接读非 SAS format 的数据，txt , csv 等	● 面试问题求教(更新了啊)
● 如何在matlab中直接call R呢？	● 统计字符出现频率的Unix 命令
● SAS文件读入的问题	● 电话面试，对方要求SAS code samples
● 菜鸟麻烦大家帮我看看sas base 70题的29和123题的15题	● 下载电子书的网站。
● 一个关于sas的弱问题。	● Google面试问题
● [合集] SAS question	● 有个bootstrap的问题想找人讨论下。

相关话题的讨论汇总
话题: file话题: sample话题: csv话题: 495话题: outcome1

进入Statistics版参与讨论

1

(共1页)

s******d 发帖数: 303	1 需要 infile 的是个很大的csv file 有2000 samples, 每个sample 作了 500 次检查, 每次检查包括3个项目. 现在的格式是 sample*tests sample1, exam1, outcome1, outcome2, outcome3 sample1, exam2, outcome1, outcome2, outcome3 .... .... sample1, exam500, outcome1, outcome2, outcome3 现在需要从这个大csv file 读取数据, 每个sample 一个file. 每个file 包括500次 exam 以及相应的3个outcomes. 由于我不需要读取所有的sample, 只是挑选其中的495个sample,然后每个sample输出成一个file. 问题是 1）我不想手工重复495次操作, 2）那个csv file 很大，如果我为了每一个sample 都重新读一边这个csv file, 时间会很长。有没有什么办法读一篇csv file 就自动输出495 sample 的file,
l*****k 发帖数: 587	2 use perl to read your samples into an array, then loop through your file system(grep sample infile > sample.csv) I also think combination of uniq, awk commands can also achieve that in shell 查, 【在 s*****d 的大作中提到】 : 需要 infile 的是个很大的csv file 有2000 samples, 每个sample 作了 500 次检查, : 每次检查包括3个项目. 现在的格式是 sampletests : sample1, exam1, outcome1, outcome2, outcome3 : sample1, exam2, outcome1, outcome2, outcome3 : .... : .... : sample1, exam500, outcome1, outcome2, outcome3 : 现在需要从这个大csv file 读取数据, 每个sample 一个file. 每个file 包括500次 : exam 以及相应的3个outcomes. : 由于我不需要读取所有的sample, 只是挑选其中的495个sample,然后每个sample输出成
p********a 发帖数: 5352	3 这个不是软件的问题，是逻辑的问题。用SAS INFILE读一次输入到495个FILES就可以了。SAS那个SINGLE @就是专门HOLD变量值，检测是否继续读下去的。唉，和你说再多也没用啊。
s******d 发帖数: 303	4 我用grep 式了一下，估计半个小时到1个小时的样子。因为我对bash script 不太熟，所以我并不是特别prefer 用unix 的command 来写script. 我也式了sas, 到现在三个多小时了，还没弄出一个来。
l*****k 发帖数: 587	5 those "ancient" unix commands are amazingly efficient, think the computing power and memory space available to them in the old days. 【在 s******d 的大作中提到】 : 我用grep 式了一下，估计半个小时到1个小时的样子。因为我对bash script 不太熟， : 所以我并不是特别prefer 用unix 的command 来写script. : 我也式了sas, 到现在三个多小时了，还没弄出一个来。
R******d 发帖数: 1436	6 step=500;for((i=1;i<990000;i+=$step)); do sed -n ''$i,`expr $i + $step - 1` p'' file > file$i;done
s*r 发帖数: 2757	7 paypal me 49.5 dollar. i will write a perl script for you which can do the work under unix 查, 【在 s*****d 的大作中提到】 : 需要 infile 的是个很大的csv file 有2000 samples, 每个sample 作了 500 次检查, : 每次检查包括3个项目. 现在的格式是 sampletests : sample1, exam1, outcome1, outcome2, outcome3 : sample1, exam2, outcome1, outcome2, outcome3 : .... : .... : sample1, exam500, outcome1, outcome2, outcome3 : 现在需要从这个大csv file 读取数据, 每个sample 一个file. 每个file 包括500次 : exam 以及相应的3个outcomes. : 由于我不需要读取所有的sample, 只是挑选其中的495个sample,然后每个sample输出成
D******n 发帖数: 2836	8 first 495 samples or the ones u specified? if only first 495 samples , it is easy. head -247500 yourfile.csv\|split -l 500 - smallfile -d 查, 【在 s*****d 的大作中提到】 : 需要 infile 的是个很大的csv file 有2000 samples, 每个sample 作了 500 次检查, : 每次检查包括3个项目. 现在的格式是 sampletests : sample1, exam1, outcome1, outcome2, outcome3 : sample1, exam2, outcome1, outcome2, outcome3 : .... : .... : sample1, exam500, outcome1, outcome2, outcome3 : 现在需要从这个大csv file 读取数据, 每个sample 一个file. 每个file 包括500次 : exam 以及相应的3个outcomes. : 由于我不需要读取所有的sample, 只是挑选其中的495个sample,然后每个sample输出成
l*****k 发帖数: 587	9 Kao, business man: open IFILE, " while() { chomp print "extracting $_ right now \n"; $ofilename = $_.".txt"; system("grep $_ your150Gfile > $ofilename") } 【在 s*r 的大作中提到】 : paypal me 49.5 dollar. : i will write a perl script for you which can do the work under unix : : 查,
s******d 发帖数: 303	10
s******d 发帖数: 303	11 thanks, leohawk. the prog works.

1

(共1页)

进入Statistics版参与讨论

相关主题
● 有个bootstrap的问题想找人讨论下。	● SAS文件读入的问题
● [求教]文献里的一句话，关于MCMC的	● 菜鸟麻烦大家帮我看看sas base 70题的29和123题的15题
● 概率问题。。。	● 一个关于sas的弱问题。
● Ask a question about one sample test	● [合集] SAS question
● 请高手帮我看看：glimmix model 有更好的建议吗？	● SAS 求助: filenames
● SAS file question,thanks very mcuh	● R 中如何读取zip file中的文件
● Proc SQL 能不能直接读非 SAS format 的数据，txt , csv 等	● 面试问题求教(更新了啊)
● 如何在matlab中直接call R呢？	● 统计字符出现频率的Unix 命令

相关话题的讨论汇总
话题: file话题: sample话题: csv话题: 495话题: outcome1

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)