p*****o 发帖数: 543 | 1 in my txt file, the delimitor is comma, and field enclosed in double quotes.
eg:
"Jack, Liu", "Lucy, Li", ...
when I imported into sas, I know how to use delimiter = ',', but how can i
handle the comma the double quotes? What option should I use for it?
Thanks a lot |
|
y****d 发帖数: 432 | 2 ★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
前面说明:
需要的童鞋请到我的签名档的博客查找!谢谢!发E-mail太累了!
觉得有价值的话可以顶一下,以便更多的人看到!谢谢!
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
SAS2010全球论坛文章
1-10
Getting Connected to Your Data with SAS/CONNECT®
A Robust and Flexible Approach to Automating SAS® Jobs Under UNIX
Using SAS® Output Delivery System (ODS) Markup to Generate Custom
PivotTable and PivotChart Reports
Creating Easily Reusable and Extensible Processes: Code That Thinks for
Itself
ODS HTML Evolution, HTML that scrolls, panels, ... 阅读全帖 |
|
b*****e 发帖数: 223 | 3 数据有 500 columns,大多数是 numeric,个别是 char。其实只需要其中十个左右
columns。我想这样弄的,试图把所有 columns 都读成 char,但是不行。有没有什么
好办法?我不想把 char column 一个个人为找出来单独列出来读,因为其实大部分都
用不到。
或者,有什么只读我需要的那些 columns 的读数据的方法?以前没读过这么多列的数据
,没经验
data ALLPAGE;
infile "\....\My Documents\MYDATA.txt" delimiter='09'x
firstobs=4 obs=1410 dsd lrecl=10000 missover;
input COL1-COL500 ; /* 这样 char col 都是空白 */
input COL1-COL500 $ ; /* 这样不行? */
input COL1-COL222 COL223 $ ..... COLxxx-COL500; /* 嫌麻烦 */
run; |
|
A*******s 发帖数: 3942 | 4 use delimiters and use %scan to separate them in ur macros, for example
-sysparm parm1|parm2|parm3
system |
|
o*****a 发帖数: 229 | 5 请问大家,如果我有一个 ,delimited text file,有些records have multiple rows,
在读成sas数据的时候,如何才能让其读成一个record?
例子:其中id 102 有两行,但是属于同一个record. 我如何在才能将其读成一个
record 呢?
谢谢各位高手的帮助!
id name r1 r2 r3 r4 r5
100, Grace, 3,1,5,2,6
101, Martin, 1,2,4,2,3
102, Scott, 9,10,4,
5, 6
103, Bob,2 ,1, 2, 2,4 |
|
o*****a 发帖数: 229 | 6 请问大家,如果我有一个 ,delimited text file,有些records have multiple rows,
而且有missing value 没有comma 隔开。
在读成sas数据的时候,如何才能让其读成一个record?
例子:其中id 102 有两行,但是属于同一个record. 101 有 没有隔开的missing
value. 我如何在才能正确得到我想要的数据。前次问,有建议说用@@。 可是那样会把
102 读到101 一起去。
谢谢各位高手的帮助!
id name r1 r2 r3 r4 r5
100, Grace, 3,1,5,2,6
101, Martin, 1,2,4
102, Scott, 9,10,4,
5, 6
103, Bob,2 ,1, 2, 2,4 |
|
d*******o 发帖数: 493 | 7 Transforming Excel to CSV , tab-delimited text or fixed-width text may be a
good option. You will have more control over there. |
|
d******9 发帖数: 404 | 8 Scan function should works, however, we need define the delimiter as ",".
Try this:
length F_name L_name $20;
F_name=scan(Name,1,",");
L_name=scan(Name,2,","); |
|
|
u*********r 发帖数: 1181 | 10 在run 一个程序,用csv 文件读入数据
但是 发现程序隔行读数据,本来灭个变量有24个数据,最后读了12个
不晓得发生什么问题,我是SAS 蝌蚪
请大牛指教
现贴开始的一段code
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'k:\Kaiwang\run20\run20.CSV' delimiter = ',' firstobs=2 dsd ;
informat LVCYP1A1 best32. ; |
|
D******n 发帖数: 2836 | 11 delimited? fixed length or complicated style(those what to read next depends
on the current field)? |
|
M*P 发帖数: 6456 | 12 I was reading a tab delimited file using either read.table() or read.csv().
read.table() cannot read the whole file as it claims that some row has less
elements than others, but read.csv() can read the whole table without error.
What is the difference between read.table() and read.csv()?
|
|
h********o 发帖数: 103 | 13 This is LIST INPUT in which its default delimiter is blank. The data for AGE
variable are invalid since you define variable as numeric but you require
inputting charater values. So all three AGE values are set to missing and of
course less than or equal to 10. |
|
S******3 发帖数: 66 | 14 The key (考点)is: SAS will go to a new line when INPUT statement reached
past the end of a line without finding the specified delimiter. Here it
reads row 1 and row 2 as one record, so 3 is right. |
|
l*******0 发帖数: 12 | 15 If you specify proper delimiters when you open the file with EXCEL, the file
will be clean and neat. |
|
c*********r 发帖数: 1802 | 16 继续苦读sas中。。。还是希望牛人们都来帮忙看看!
SAS guide pg. 370:
MDY(5,10,20)=May 10,1920 ------是不是印刷错误?
SAS guide pg. 394:
contact="ADMIN. ASST.";
propcase(contact);
结果为啥是 "Admin. Asst."?
如果句号是default delimiter for propcase()的话,
难道不应该是"Admin Asst"? |
|
x***x 发帖数: 3401 | 17 第一个你得给个上下文。MDY(5,10,20)本身对应May 10,1920是没错的。
我觉得应该是Admin. Asst. PROPCASE()只会改大小写, 不会删字符的。
delimiter的意思是PROPCASE会一律认为.是一个句子的结尾 |
|
c*********r 发帖数: 1802 | 18 第一个问题,yearcutoff=default,应该是1920,我觉得是1940。
第二个,我的问题是为什么这里的dot不被识别为delimiter?
谢谢大牛了!
包子没问题。 |
|
x*******u 发帖数: 500 | 19 从数据上看第一个变量的长度是10, 但是用你的code读出来结果是这样的:
char1 char2 char3
1 0 / 3 0 0 0 9 .1 4 1
中间还是有空格。
我用proc import读入数据后, log里面是这样的:
data WORK.READASC ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'myfile.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=7 ;
informat VAR1 $21. ;
informat VAR2 $23. ;
informat VAR3 $9. ;
format VAR1 $21. ;
format VAR2 $23. ;
format VAR3 $9. ... 阅读全帖 |
|
s*****9 发帖数: 285 | 20 倒进去proc import用xls
就不要注意太多delimiter,missover,这些细节 |
|
c****r 发帖数: 576 | 21 试试textscan(fid,'%s','delimiter','/n')? 就是先打开文件,再用textscan读取。 |
|
d******9 发帖数: 404 | 22 Because SAS treats both 1 and : as delimiters, so they are skipped.
In your case, you should use:
Y= SCAN(X,2,':');
Then you will get
Y=10
20
160
80 |
|
|
G***G 发帖数: 16778 | 24 which statistic program can load 14G tab-delimited text?
R?
matlab? |
|
G***G 发帖数: 16778 | 25 which statistic program can load 14G tab-delimited text?
R?
matlab? |
|
s*********y 发帖数: 34 | 26 Raw Data:
"MARKETING_ID","CUSTOMER_NAME","Mail_ADDRESS","Mail_CITY","Mail_STATE","Mail
_POSTAL_CODE"
"EA00000","JAMES J IANAZONE","1238 LAKE FRONT BLVD","NORTH LIMA","OH","44452"
"EA00001","MARK A PETERSEN","6715 ERIE AV NW","CANAL FULTON","OH","44614"
"EA00002","BRANDON S BURGER","1694 TANGLEWOOD DR","AKRON","OH","44313"
data edsion;
infile " C:\Documents and Settings\Toledo_Edison_MailingList_03_09_2012 7MWH
.txt" delimiter=",";
input MARKETING_ID $ 2-8 +3 CUSTOMER_NAME $ 11. Mail_ADDRESS $ ... 阅读全帖 |
|
P****D 发帖数: 11146 | 27 The txt you mentioned: comma-delimited file.
SAS: the extension name is SAS7bdat. |
|
s********r 发帖数: 297 | 28 Column1 Column2: (#.of.A) / (total.#)
A, B, C 1 / 3
A, B, A, D 2 / 4
A, B, A, D, A 3 / 5
.....
请问已知在一个CSV文件里 column 1 里面 是无次序并且可能重复的人的代号(用A,B..
.等字母代表)
delimiter是 ","
请问怎么在csv文件里添加一个 new column (column2) 并且算出 A 那个人在每个row
的出现的frequency 除以 总共人数的ratio呢 |
|
k******u 发帖数: 250 | 29 数据为csv file
如下:
123,"Harold Wilson",Acct,01/15/1989,$78,123.
128,"Julia Child",Food,08/29/1988,$89,123
007,"James bond",Security,02/01/2000,$82,1000
828,"Roger Doger",Acct,08/15/1999,$39,100
900,"Earl Davenport",Food,09/09/1989,$45,399
906,"James Swindler",Acct,12/21/1978,$78,200
comma是delimiter.
写了程序如下,
data Employ;
infile 'C:employee.txt' dsd;
input ID : $3.
Name : $20.
Depart : $8.
DateHire : mmddyy10.
salary : dollar8.
;
ru... 阅读全帖 |
|
D******n 发帖数: 2836 | 30 output as tab delimited and then read in |
|
w*********n 发帖数: 30 | 31 if it's a real csv file, use excel "save as" to save it as tab-delimited txt
file, then
x<-read.table("yourfile.txt",header=F,sep="\t")
plus, don't confuse "sep" and "quote" |
|
r*******i 发帖数: 534 | 32 dsd changes the default delimiter from blank to a comma |
|
w*****5 发帖数: 515 | 33 1) 一般大文件在UNIX服务器上,用命令: head -10 your_file>temp。然后download
这个有10行数据的小文件到PC上看。或者直接head -10 your_file在unix界面上看。
2)看你的文件格式,如果没有delimiter, 那么每个variable应该在固定的列数里,比
如jobtile在15列到30列,这时候要用@15 jobtile..这个information一般是事先给定
的。 |
|
a*****4 发帖数: 986 | 34 大侠们:求救...
I need export a SAS Dataset to Pipe Delimited file with the following
requirements:
In filler's column, maximum length only can be to 250. Variables in the SAS
dataset include both Char/Numerical with different lengths.These can only be
put in one big filler column in a fix-length format.
一共有90多个变量,最后的输出表格只有10来个Column用Pipe隔开,所以除了几个基
本变量,其他的变量要整在一个Column里,这个Column内部要是fix-length格式
怎么把这些不同的变量用fixed length format弄到一个Column里? |
|
t*****w 发帖数: 254 | 35 what does pipe delimited mean?
SAS
be |
|
K***a 发帖数: 72 | 36 Please help, I’m using a sample code from SAS here, want to get a different
result.
data _null_;
ExpressionID = prxparse('/(?:s|,?)([crb]at) ?(?:,)?/');
text = 'The woods have a bat, cat and a rat';
start = 1;
stop = length(text);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the start parameter so that searching */
/* begins again after the last match. ... 阅读全帖 |
|
B******y 发帖数: 9065 | 37 原始数据如下
ID Index
A 11
B 1 & 8
C 2, 3, 10
D 5 7
E 7 and 8
希望新数据为
ID Index
A 11
B 1
B 8
C 2
C 3
C 10
D 5
D 7
E 7
E 8
也就是说,原始数据中Index有多次输入,delimiter有&,逗号,空格,甚至是字母(
目前还没有发现第5种情况,但估计可能会有:;这样的)。所以必须将单行分成多行
,每行只能有一个Index数字(暂不考虑每个ID有重复Index的可能)。希望能有最简洁
快速的方法。多谢了! |
|
d*******y 发帖数: 349 | 38 这是我做的test。
test1
1|2|3
2|3|4
3|2|5
test2
1|a|f
2|b|g
join test1 and test2 using regular left outer join.
join -t"|" -a1 test1 test2 > test3
now, process test3
awk -F"|" 'BEGIN{for(i=1;i<=2;i++) testvar=testvar"|0"}NF<5 {print $0testvar;
next}{print $0}' test3 > test4
注意事项:
1.我的例子用的是pipe做为delimiter,习惯.
2.希望for loop要多长看你的两个文件column 差多少。
试试吧。不过几万列,不知道你的awk版本给不给力。如果不行试试gawk。 |
|
w*****1 发帖数: 473 | 39 如果我还是用space delimiter,就是要改成:
join -t" " -a1 test1 test2 > test3
awk -F" " 'BEGIN{for(i=1;i<=2;i++) testvar=testvar"|0"}NF<5 {print $
0testvar;
next}{print $0}' test3 > test4
是吗?谢谢! |
|
w*****1 发帖数: 473 | 40 testvar=testvar"|0"}
如果是用space delimiter的话,0前面的|是不是要去掉 |
|
B*A 发帖数: 83 | 41 还是要看谁在问这个问题。如果是办公室文员问的,答案可以是:
把access table存成delimited text,upload to Hadoop,完事
坏处是一点技术含量都没有,不显水平。提供这个方案的人会被定义为无脑大妈:)
好处是文员妹妹的简历加上了浓浓的一笔:传统数据库和大数据的共享和转换
★ 发自iPhone App: ChineseWeb 8.1 |
|
B*****g 发帖数: 34098 | 42 -- Hive queries for Word Count
drop table if exists doc;
-- 1) create table to load whole file
create table doc(
text string
) row format delimited fields terminated by 'n' stored as textfile;
--2) loads plain text file
--if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc
table)
load data local inpath '/home/trendwise/Documents/sentiment/doc_data/
wikipedia' overwrite into table doc;
-- Trick-1
-- 3) wordCount in single line
SELECT word, COUNT(*) FROM doc LATERAL VIEW explo... 阅读全帖 |
|
T*****u 发帖数: 7103 | 43 如果需要经常做就用sql吧
step 1.
export PGPASSWORD=yourpgpassworld
psql -h yourhost -p 5432 -U yourusername -d yourdb -c "\drop table if exist
xxx; create table xxx (xxx); copy xxx from ./yourcsvfile.csv with delimiter
','"
step 2. 一个简单的select 语句 |
|
p**z 发帖数: 65 | 44 CSV文件(或者其他类似的文本数据文件,比如tab delimited):
简单的全数据文件,只要用 numpy.loadtxt()。可是不够灵活,任何文本,数据不存在
等情形都会出错。
numpy.genfromtxt() 更加灵活,可以更改参数适应各种情况。
Excel文件:常用的包是xlrd。下面是一个最简单的例子
import xlrd
fn = r'c:\temp\test.xls’
wb = xlrd.open_workbook(fn)
sh = wb.sheet_by_index(0)
coldata = sh.col_slice(0, 4, 10)
firstdata = coldata[0].value
我在 Python 2.7 下用的 xlrd 版本还不支持 .xlsx 文件,所以 .xlsx 文件要先另存
为 Excel 97-2003 的 .xls 文件才可以。 |
|
l******9 发帖数: 579 | 45 【 以下文字转载自 JobHunting 讨论区 】
发信人: light009 (light009), 信区: JobHunting
标 题: error of opening a file located in a remote server from pyton
发信站: BBS 未名空间站 (Sun Jul 27 19:03:52 2014, 美东)
I need to access read a csv file located in a server from python 3.2 on win7.
The file name is
csv_file =
file_loc = '\serverName.myCompanyName.com\mypath\Files\myfile.csv'
with open(file_loc , 'r') as csv_file # error !!!
csv_reader = csv.reader(csv_file, delimiter=',')
error:
IOError: [Err... 阅读全帖 |
|
G******f 发帖数: 16223 | 46 应该不是安装问题吧。你是用 data --> from text --> fixed width or delimited随
便选 --> column data format里面选择第二行为text format -> finish --> okay么? |
|