首页 > 编程语言> > Python - 爬虫 - Xpath定位之starts-with()和string()函数的简单使用

Python - 爬虫 - Xpath定位之starts-with()和string()函数的简单使用

2021-05-21 18:59:42 作者：互联网

Python - 爬虫 - Xpath定位之starts-with()和string()函数的简单使用

文章目录

Python - 爬虫 - Xpath定位之starts-with()和string()函数的简单使用

starts-with()

1. 函数原型

函数	解释
`fn:starts-with(string1,string2)`	如果 string1 以 string2 开始，则返回 true，否则返回 false。例子：starts-with(‘XML’,‘X’) 结果：true

2. 使用starts-with()获取相同字符开头的多个标签

测试用的HTML

<li class="tag-1">列表项_1</li>
<li class="tag-2">列表项_2</li>
<li class="tag-3">列表项_3</li>
<li class="item-4">列表项_4</li>

使用starts-with()获取class属性开头为"tag"的<li>标签的文本

from lxml import etree

html_txt = """
<li class="tag-1">列表项_1</li>
<li class="tag-2">列表项_2</li>
<li class="tag-3">列表项_3</li>
<li class="item-4">列表项_4</li>
"""

# etree选择器
selector = etree.HTML(html_txt)
# 使用starts-with()获取class属性开头为"tag"的<li>标签的文本
contents = selector.xpath('//li[starts-with(@class, "tag")]/text()')

# 打印获取到的文本
for content in contents:
    print(content)

运行结果

列表项_1
列表项_2   
列表项_3

代码解释

可以看到结果不包括这一项<li class="item-4">列表项_4</li>
starts-with(string1,string2)
该函数比较的是两个字符串，我们先通过@class获取到相应<li>标签的class属性值，然后再根据string2匹配，成功则返回true，选取该<li>标签，false则不选取

string()

1. 函数原型

函数	解释
`fn:string(arg)`	返回参数的字符串值。参数可以是数字、逻辑值或节点集。例子：string(314) 结果：“314”

2. 使用string()获取标签套标签的文本内容

测试用的HTML

<div class="red">
    内容1
    <div>
        内容2
        <div>
            内容3
        </div>
    </div>
</div>

使用string()获取标签套标签的文本内容

from lxml import etree

html_text = """
<div class="red">
    内容1
    <div>
        内容2
        <div>
            内容3
        </div>
    </div>
</div>
"""

selector = etree.HTML(html_text)

# 选择class属性为"red"的list，选择0号项（因为只有1个符合条件的div）
content1 = selector.xpath('//div[@class="red"]')[0]
# 选择当前结点返回字符串
content2 = content1.xpath('string(.)')

# 打印输出字符串
print(content2)

运行结果

代码解释

string()参数可以是数字、逻辑值或节点集

string(.)表示选择当前结点返回字符串

尝试使用string(div)

content2 = content1.xpath('string(div)')
print(content2)

运行结果


      内容2

          内容3

参考

菜鸟教程XPath、XQuery 以及 XSLT 函数函数参考手册：https://www.runoob.com/xpath/xpath-functions.html
《从零开始学Python网络爬虫》 - 罗攀蒋仟

标签：Xpath,string,Python,标签,列表,starts,class,函数
来源： https://blog.csdn.net/weixin_42490414/article/details/117127905