Check the similarity between two words with NLTK with Python -
i have 2 lists , want check similarity between each words in 2 list , find out maximum similarity.here code,
from nltk.corpus import wordnet list1 = ['compare', 'require'] list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write'] list = [] word1 in list1: word2 in list2: wordfromlist1 = wordnet.synsets(word1)[0] wordfromlist2 = wordnet.synsets(word2)[0] s = wordfromlist1.wup_similarity(wordfromlist2) list.append(s) print(max(list))
but result error:
wordfromlist2 = wordnet.synsets(word2)[0] indexerror: list index out of range
please me fix this.
thanking you
you're getting error if synset list empty, , try element @ (non-existent) index zero. why check zero'th element? if want check everything, try pairs of elements in returned synsets. can use itertools.product()
save 2 for-loops:
from itertools import product sims = [] word1, word2 in product(list1, list2): syns1 = wordnet.synsets(word1) syns2 = wordnet.synsets(word2) sense1, sense2 in product(syns1, syns2): d = wordnet.wup_similarity(sense1, sense2) sims.append((d, syns1, syns2))
this inefficient because same synsets looked again , again, closest logic of code. if have enough data make speed issue, can speed collecting synsets words in list1
, list2
once, , taking product of synsets.
>>> allsyns1 = set(ss word in list1 ss in wordnet.synsets(word)) >>> allsyns2 = set(ss word in list2 ss in wordnet.synsets(word)) >>> best = max((wordnet.wup_similarity(s1, s2) or 0, s1, s2) s1, s2 in product(allsyns1, allsyns2)) >>> print(best) (0.9411764705882353, synset('command.v.02'), synset('order.v.01'))
Comments
Post a Comment