python – 检查是否可以进行分词
作者:互联网
这是this response的后续问题以及用户发布的伪代码算法.由于它的年龄,我没有对这个问题发表评论.我只想验证一个字符串是否可以拆分成单词.该算法不需要实际拆分字符串.这是相关问题的回复:
Let S[1..length(w)] be a table with Boolean entries. S[i] is true if
the word w[1..i] can be split. Then set S[1] = isWord(w[1]) and for
i=2 to length(w) calculateS[i] = (isWord[w[1..i] or for any j in {2..i}: S[j-1] and
isWord[j..i]).
我正在将这个算法翻译成简单的python代码,但我不确定我是否正确理解它.码:
def is_all_words(a_string, dictionary)):
str_len = len(a_string)
S = [False] * str_len
S[0] = is_word(a_string[0], dictionary)
for i in range(1, str_len):
check = is_word(a_string[0:i], dictionary)
if (check):
S[i] = check
else:
for j in range(1, str_len):
check = (S[j - 1] and is_word(a_string[j:i]), dictionary)
if (check):
S[i] == True
break
return S
我有两个相关的问题. 1)这个代码是否是链接算法到Python的正确翻译,如果是,2)现在我有了S,我怎么用它来判断字符串是否只包含单词?在这种情况下,is_word是一个简单地在列表中查找给定单词的函数.我还没有实现它作为特里.
更新:更新代码以包含建议的更改后,它不起作用.这是更新的代码:
def is_all_words(a_string, dictionary)):
str_len = len(a_string)
S = [False] * str_len
S[0] = is_word(a_string[0], dictionary)
for i in range(1, str_len):
check = is_word(a_string[0:i], dictionary)
if (check):
S[i] = check
else:
for j in range(1, i): #THIS LINE WAS UPDATED
check = (S[j - 1] and is_word(a_string[j:i]), dictionary)
if (check):
S[i] == True
break
return S
a_string = "carrotforever"
S = is_all_words(a_string, dictionary)
print(S[len(S) - 1]) #prints FALSE
a_string = "hello"
S = is_all_words(a_string, dictionary)
print(S[len(S) - 1]) #prints TRUE
对于这两者,它应该返回True.
解决方法:
这是您的代码的修改版本,应该返回良好的结果.
请注意,您的错误只是从伪代码数组索引(从1开始)到python数组索引(从0开始)的转换,因此S [0]和S [1]填充了相同的值,其中S [L-1]实际上从未计算过.您可以通过打印整个S值轻松跟踪此错误.你会发现S [3]在第一个例子中设置为true,它应该是单词“car”的S [2].
此外,您可以通过存储到目前为止找到的复合词的索引来加速该过程,而不是测试每个位置.
def is_all_words(a_string, dictionary):
str_len = len(a_string)
S = [False] * (str_len)
# I replaced is_word function by a simple list lookup,
# feel free to replace it with whatever function you use.
# tries or suffix tree are best for this.
S[0] = (a_string[0] in dictionary)
for i in range(1, str_len):
check = a_string[0:i+1] in dictionary # i+1 instead of i
if (check):
S[i] = check
else:
for j in range(0,i+1): # i+1 instead of i
if (S[j-1] and (a_string[j:i+1] in dictionary)): # i+1 instead of i
S[i] = True
break
return S
a_string = "carrotforever"
S = is_all_words(a_string, ["a","car","carrot","for","eve","forever"])
print(S[len(a_string)-1]) #prints TRUE
a_string = "helloworld"
S = is_all_words(a_string, ["hello","world"])
print(S[len(a_string)-1]) #prints TRUE
标签:python,algorithm,nlp,dynamic-programming,text-segmentation 来源: https://codeday.me/bug/20190621/1250121.html