编程语言
首页 > 编程语言> > 字符串搜索在python中返回匹配的行

字符串搜索在python中返回匹配的行

作者:互联网

我是python中的新手.我想在文件的某些行中匹配字符串.让我们说,
我有字符串:

british    7
German     8
France     90

我在文件中有一些行,如:

<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>
<s id="15-8">...And Then There Were Three... is the ninth studio album by the german band Genesis 8 and was released in 1978.</s>
<s id="1990-2">Magnum Nitro Express is a France centerfire fire rifle cartridge 90.</s>

我希望输出如下:

<s id="69-7">...Meanwhile is the studio <w2>7</w2> album by <w1>British</w1> pop band 10cc.</s>
<s id="15-8">...And Then There Were Three... is the ninth studio album by the <w1>german</w1> band Genesis <w2>8</w2> and was released in 1978.</s>
<s id="1990-2">Magnum Nitro Express is a <w1>France</w1> centerfire fire rifle cartridge <w2>90</w2>.</s>

我尝试使用以下代码:

for i in file:      
    if left in i and right in i:
        line = i.replace(left, '<w1>' + left + '</w1>')
        lineR = line.replace(right, '<w2>' + right + '</w2>')
        text = text + lineR + "\n"
        continue
     return text

但是,它也匹配来自id.eg的字符串.

<s id="69-<w2>7</w2>">...Meanwhile is the studio <w2>7</w2> album by <w1>British</w1> pop band 10cc.</s>

那么,有没有办法将字符串搜索为字而不是字符,以便我可以逃避< s id =“69-< w2> 7< / w2>”> ?

提前感谢您提供任何帮助.

解决方法:

您应该使用正则表达式专门替换单个单词,而不是单词部分.

就像是

import re
left='british'
right='7'
i1 = re.sub('(?i)(\s+)(%s)(\s+)'%left, '\\1<w1>\\2</w1>\\3', i)
i2 = re.sub('(?i)(\s+)(%s)(\s+)'%right, '\\1<w2>\\2</w2>\\3', i1)
print(i2)

这给了我们’< s id =“69-7”> …同时是工作室< w2> 7< / w2>专辑由< w1>英国< / w1>流行乐队10cc.< / s>‘

如果这种方法导致错误,您可以尝试更精确的代码,例如

import re

def do(left, right, line):
    parts = [x for x in re.split('(<[^>]+>)', line) if x]
    for idx, l in enumerate(parts):
        lu = l.upper()
        if (not ('<s' in l or 's>' in l) and
            (left.upper() in lu and right.upper() in lu)):
            l = re.sub('(?i)(\s+)(%s)(\s+)'%left, '\\1<w1>\\2</w1>\\3', l)
            l = re.sub('(?i)(\s+)(%s)(\s+)'%right, '\\1<w2>\\2</w2>\\3', l)
            parts[idx] = l

    return ''.join(parts)


line = '<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>'
print(do('british', '7', line))
print(do('british', '-7', line))

标签:string-matching,python
来源: https://codeday.me/bug/20190726/1542316.html