由买买提看人间百态

topics

全部话题 - 话题: soundex
(共0页)
A*******n
发帖数: 625
1
来自主题: Database版 - SOUNDEX function in sql server
Returns a four-character (SOUNDEX) code to evaluate the similarity of two
strings.
这个“four-character”到底是什么啊,和什么string比?
Returning the SOUNDEX for Smith and Smythe returns the same SOUNDEX result
because all vowels, the letter y, doubled letters, and the letter h, are not
included.
-- Using SOUNDEX
SELECT SOUNDEX ('Smith'), SOUNDEX ('Smythe');
----- -----
S530 S530
(1 row(s) affected)
这个S530怎么来的?
这个原因也很怪“because all vowels, the letter y, doubled letters, and the
letter h, are not included”。按这个原... 阅读全帖
c****s
发帖数: 10
g**e
发帖数: 6127
3
来自主题: JobHunting版 - 贡献个设计题
soundex只能比较发音相近的。preprocess/hash soundex呗
w*r
发帖数: 2421
4
来自主题: Apple版 - TNND, ATT穷疯了吗?
可怜的同学,消消气,你直接和CS说你没有用,他们搞错了,就完了,这个
tethering的分析不是
你们想象的那样抓什么header之类的,我只能说我觉得完全是一个predictive model,
其中有一个变量是soundex
last nane and firstname,中国人,日本人三哥的名字的soundex出来的significance
比较高,你的traffic 已经在
top 3 percentile里面,所以…………
目前为止我还没在DW里面看到有送traffic content相关的table...
n*****s
发帖数: 10232
5
来自主题: Statistics版 - how to fuzzy match in SAS?
I am looking at soundex/spedis/compgen now. sounds like compgen is more robust than soundex.
a*****y
发帖数: 405
6
不知道,滴乐说必须得要有learner's permit那种卡,然后复印出来才可以上title
我那个只是一张test score sheet,上面有soundex no.但可能没有相片的原因吧
我现在就比较担心怕这样都不能开车上路。。。。
g**e
发帖数: 6127
7
来自主题: JobHunting版 - 贡献个设计题
Fuzzy/Approximate string matching,实际应用很广泛。
有一堆target string。给定一个新的string,要从target string里面找出拼写/发音
类似的。如何scale?
答每次计算levenshtein distance的不及格
答soundex的及格
答用lucene的直接拖出去
y*******g
发帖数: 6599
8
来自主题: JobHunting版 - 贡献个设计题
soundex 闻所未闻啊,,太专业了
f*****e
发帖数: 2992
9
来自主题: JobHunting版 - 贡献个设计题
soundex怎么比较?怎么用?
m*****l
发帖数: 1292
10
I do see the term "APPLICANT’S SOUNDEX/MARYLAND DRIVER LICENSE NO." on the
registration form, but my labmate registered his car before he got his DL in
2006.
To title and register your newly purchased used vehicle, you will need to
submit the following documents (along with payment for taxes and fees):
Proof of ownership - You must submit the vehicle's current title that has
been properly assigned to you. Note that if the title was issued in
Maryland, it can be used as your application form for
B*****g
发帖数: 34098
11
来自主题: Database版 - m(- -)m 求解算法
地址,人名 are much more easier.
地址: buy software can standardize address based on usps db, my company use
code1
人名: soundex is enough
m********5
发帖数: 619
12
来自主题: Database版 - m(- -)m 求解算法
是不是那些CSR都用的 soundex
还要发音准啊
每次都match不上-_-

use
w*r
发帖数: 2421
13
来自主题: Database版 - m(- -)m 求解算法
我写过类似的东西,用java store proc in oracle实现了string distance/soundex(
phonetic
index)/synonyms的比对,然后分别取一个best fitting linear regression就解决了
w*r
发帖数: 2421
14
来自主题: Database版 - m(- -)m 求解算法
synonym没有捷径,再聪明的算法也不会知道William和James是一个名字,所以
synonym的字典要建好,在人名处理的时候有一个好处就是男名和女名可以打标识,
所以结了婚改last name的,不会从string distance里面跳出来,直接被名字
的性别给supress了。
quick reference for soundex algorithms in java:
http://commons.apache.org/codec/userguide.html
check wikipedia, it should have implementation in C++ and
other languages
s*********e
发帖数: 1051
15
来自主题: Statistics版 - how to fuzzy match in SAS?
soundex().
l****g
发帖数: 304
16
Thank you again.
So for the name not exact same ,however the same person( a letter different)
, how we treat those two records as same record? Using soundex or something
other method?
D******n
发帖数: 2836
17
来自主题: Statistics版 - 新手问个问题 (转载)
create a .vim directory under you home directory(there is a dot before
vim)
and then create a syntax directory under it
and then create a sas.vim file under the syntax directory
==============sas.vim======================
if version < 600
syntax clear
elseif exists("b:current_syntax")
finish
endif
syn case ignore
syn region sasString start=+"+ skip=+\\|\"+ end=+"+
syn region sasString start=+'+ skip=+\\|\"+ end=+'+
" Want region from 'cards;' to ';' to be captured (Bob Heckel)
sy... 阅读全帖
m******u
发帖数: 277
18
来自主题: Statistics版 - 【请教】关于Text mining
比如一些medical record数据库,里面的记录很杂乱,有不同的拼写方式、甚至还有
typo。不知道大家是否有过text mining的经验。可否指点一二? 多谢啦~~~
好像SAS有soundex function不知道效果怎么样?
c**d
发帖数: 104
19
来自主题: Statistics版 - Standardize city names in SAS
/* get unique city list from your data */
ods output onewayfreqs = a;
proc freq data = yourdata;
table city;
run;
/* create a look-up table from sas library */
ods output oneayfreqs = b;
proc freq data = sashelp.zipcode;
table city;
run;
/* full join two data sets */
/* you have two ways to do it*/
/* 1: use Perl Regular Expressions functions in sas to match two strings */
/* 2: sas has Functions That Compare Strings (Exact and "Fuzzy" Comparisons)
*/
/* for example: COMPARE COMPLEV CALL COMPCO... 阅读全帖
(共0页)