java - Phonetic search for Indian languages -
i want compare strings phonetically in android app. special case here is, want compare indian language words written in english. example, want check if "edhu" "adhu" "yethu" phonetically equal, mean same in tamil language. people use english script write indian languages use different spellings make word. how compare words in case?
i tried out levenshtein. not sure how convert number returns equality.
i tried out soundex, soundex codes not same when first letter of word changes. able figure out similar sounding parts. don't understand how works.
soundex.encode("yethu") (soundex.encode("edhu")) (soundex.encode("adhu")) y300 e300 a300
as understand want take words written in english, decompose them phonetically, , group words spelled differently, have same phonetic representations.
for soundex 90% solution, provided people spelling words in english using correct consonants when translating words tamil english.
you should able drop first value soundex representation , use encoding when first letter vowel.
the reason soundex ( https://en.wikipedia.org/wiki/soundex ) performs encodings on consonants in words presented with. throws away vowels plus h , w - unless - vowel first letter in word - explains why values different, in first letter's encoding.
as zeros, soundex encodings definition 1 letter , 3 numbers( 1 through 6 only), have 1 consonant in each word (d or t) , soundex maps both of them number 3. since there no more consonants, believe adds 2 zeros compliance. letter300
if going continue use soundex app should bare in mind can give 26*6*6*6 = 5616 unique encodings based on letter number(1-6) number(1-6) number(1-6) scheme. means phonetic encodings not unique , words radically different have soundex encodings collide.
Comments
Post a Comment