轻轻学爬虫—scrapy框架巧用7—猴子偷桃(3)
作者:互联网
# 轻轻学爬虫—scrapy框架巧用7—猴子偷桃(3)
上节课我们讲解了bs4的一部分使用方法,今天我们来继续学习。我们还是以上节课的数据为例子
```python
html_doc = """
The Dormouse's story
<body>
The Dormouse's story
<body>
text2
The Dormouse's story
Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') soup.prettify() print(soup) #得到下面结构化的html ""“The Dormouse's story
Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well.
...
""" ``` ## .parent 通过 `.parent` 属性来获取某个元素的父节点 ```python b_tag = soup.b print(b_tag.parent) #打印输出The Dormouse's story
``` ## .parents 通过元素的 `.parents` 属性可以递归得到元素的所有父辈节点 ```python b_tag = soup.b for parent in b_tag.parents: if parent is None: print(parent) else: print(parent.name) # 打印输出 p body html [document] ``` ## .next_sibling 和 .previous_sibling 兄弟标签,指的两个便签同级别,比如样例结构中,有好多个p标签,他们都是兄弟便签。 ``` sibling_soup = BeautifulSoup("text1