编程语言
首页 > 编程语言> > python-2个句子的语义相似性度量

python-2个句子的语义相似性度量

作者:互联网

这个问题已经在这里有了答案:            >            How to compute the similarity between two text documents?                                    8个
我需要测量两个句子之间的相似度.例如:

s1 = "she is good a dog "
s2 = "she is nice a heel"

我需要证明“好”类似于“好”.对于名词和动词,按路径进行相似性度量的工作方式类似于此伪代码:

def get max :
for loop
(wn.synset ('dog ')).path_similarity(wn.synset ('animal'))

结果:.33,这是一个很高的值,那么这些词是相关的,我可以说这是相似的.但是对于副词(“ nice”和“ good”),. 09值很低!

有任何想法吗?

解决方法:

您可以找到所有好的同义词集的path_similarity然后选择最大值:

>>> from nltk.corpus import wordnet as wn
>>> n=wn.synsets('nice')
>>> g=wn.synsets('good')
>>> [i.path_similarity(n[0]) for i in g]
[0.0625, 0.06666666666666667, 0.07142857142857142, 0.09090909090909091, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

>>> max(i.path_similarity(n[0]) for i in g)
0.09090909090909091

请注意,单词的同义词集包含单词的许多形式,例如动词,none,adj等,因此您需要选择适当的单词!

另外,您还可以使用wup_similarity:

>>> round(max(i.wup_similarity(n[0]) for i in g), 1)
0.4

Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).

阅读更多关于Synsets http://www.nltk.org/howto/wordnet.html的信息

标签:wordnet,nlp,semantics,python
来源: https://codeday.me/bug/20191120/2045246.html