p***r 发帖数: 920 | 1 I have about 3000 names in a column, wanna make them clean and correct those
names who suppose to be the same such as
John Cash
John cash
Johnney Cash
John Cash #1
Capital One
C1
Capitalone
Is there any best way or existing techniques to group all these together and
clean them to the best before check it manually? |
h*********1 发帖数: 102 | 2 Since there are only 3000 names, in Excel, you can use substitute and lower/
upper functions together to remove all spaces and standardize the format,
then do a sort and check them manually.
i.e. =UPPER(SUBSTITUTE(A1," ",""))
Hope this will help. |
l***a 发帖数: 12410 | 3 这个得自己建dictionary
those
and
【在 p***r 的大作中提到】 : I have about 3000 names in a column, wanna make them clean and correct those : names who suppose to be the same such as : John Cash : John cash : Johnney Cash : John Cash #1 : Capital One : C1 : Capitalone : Is there any best way or existing techniques to group all these together and
|
D******n 发帖数: 2836 | 4 there are ways to do it for capitalone vs capital one
it is hopeless if u r talking about C1 vs capitalone unless you create your
own match table.
those
and
【在 p***r 的大作中提到】 : I have about 3000 names in a column, wanna make them clean and correct those : names who suppose to be the same such as : John Cash : John cash : Johnney Cash : John Cash #1 : Capital One : C1 : Capitalone : Is there any best way or existing techniques to group all these together and
|
P****D 发帖数: 11146 | 5 像这种数据过于千姿百态的情况,人工改正应该比建立dictionary还快。 |