由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - Data Cleaning Problem (SAS or Excel)
相关主题
Standardize city names in SAS新人求问SAS简单问题~~
#刚考完SAS BASE,问其中的一个题目#SAS初级问题:怎样standardize一个变量?
求助: SAS BASE (70题) 第45题哪位能推荐一本SPSS的书,面试要问到SPSS
问一下关于capital one 面试的东东A VERY Tricky SAS question: Help Needed with Baozi
新手问一下R,求建议SAS快捷键问题
[合集] 关于txt文件和excel文件转换的问题,急~SAS help needed: baozi will be given
[合集] excel并行一问,急~~~How well is SAS/IML studio integrating with R.?
SAS新手问一个做很多次比较的问题How to add label in Excel bar chart
相关话题的讨论汇总
话题: cash话题: john话题: excel话题: sas话题: cleaning
进入Statistics版参与讨论
1 (共1页)
p***r
发帖数: 920
1
I have about 3000 names in a column, wanna make them clean and correct those
names who suppose to be the same such as
John Cash
John cash
Johnney Cash
John Cash #1
Capital One
C1
Capitalone
Is there any best way or existing techniques to group all these together and
clean them to the best before check it manually?
h*********1
发帖数: 102
2
Since there are only 3000 names, in Excel, you can use substitute and lower/
upper functions together to remove all spaces and standardize the format,
then do a sort and check them manually.
i.e. =UPPER(SUBSTITUTE(A1," ",""))
Hope this will help.
l***a
发帖数: 12410
3
这个得自己建dictionary

those
and

【在 p***r 的大作中提到】
: I have about 3000 names in a column, wanna make them clean and correct those
: names who suppose to be the same such as
: John Cash
: John cash
: Johnney Cash
: John Cash #1
: Capital One
: C1
: Capitalone
: Is there any best way or existing techniques to group all these together and

D******n
发帖数: 2836
4
there are ways to do it for capitalone vs capital one
it is hopeless if u r talking about C1 vs capitalone unless you create your
own match table.

those
and

【在 p***r 的大作中提到】
: I have about 3000 names in a column, wanna make them clean and correct those
: names who suppose to be the same such as
: John Cash
: John cash
: Johnney Cash
: John Cash #1
: Capital One
: C1
: Capitalone
: Is there any best way or existing techniques to group all these together and

P****D
发帖数: 11146
5
像这种数据过于千姿百态的情况,人工改正应该比建立dictionary还快。
1 (共1页)
进入Statistics版参与讨论
相关主题
How to add label in Excel bar chart新手问一下R,求建议
请教怎么打开SAS manual file啊?谢谢[合集] 关于txt文件和excel文件转换的问题,急~
SAS 中match-merge两个data set的问题[合集] excel并行一问,急~~~
呼唤Bancova SAS学习班SAS新手问一个做很多次比较的问题
Standardize city names in SAS新人求问SAS简单问题~~
#刚考完SAS BASE,问其中的一个题目#SAS初级问题:怎样standardize一个变量?
求助: SAS BASE (70题) 第45题哪位能推荐一本SPSS的书,面试要问到SPSS
问一下关于capital one 面试的东东A VERY Tricky SAS question: Help Needed with Baozi
相关话题的讨论汇总
话题: cash话题: john话题: excel话题: sas话题: cleaning