# 轻轻学爬虫—scrapy框架巧用6—猴子偷桃(2) 上节课我们讲解了bs4的一部分使用方法,今天我们来继续学习。我们还是以上节课的数据为例子 ```python html_doc = """ The Dormouse's story <body>

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.


""" from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') soup.prettify() print(soup) #得到下面结构化的html ""“ The Dormouse's story <body>

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well.


## Attributes

一个tag可能有很多个属性,tag `` 有一个 "class" 的属性,值为 "sister" . tag的属性的操作方法与字典相同:

```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
soup.prettify()
#获取属性
tag = soup.a
print(tag['class'])
#也可以这样写
print(soup.a['class'])
#打印输出
['sister']
```

也可以直接取属性, 比如: `.attrs` :

```python
# 获取属性
print(tag.attrs)
# 打印输出得到
{'href': 'http://example.com/elsie', 'class': ['sister'], 'id': 'link1'}
```

## .contents 和 .children

tag的 `.contents` 属性可以将tag的子节点以列表的方式输出:

```python
p_tag = soup.p
print(p_tag.contents)
#打印输出
[The Dormouse's story]
```

通过tag的 `.children` 生成器,可以对tag的子节点进行循环:

```python
for child in p_tag.children:
    print(child)
#打印输出
The Dormouse's story
```

## .descendants

`.descendants` 属性可以对所有tag的子孙节点进行递归循环

```python
for i in p_tag.descendants:
    print(i)
# 打印输出
The Dormouse's story
```

## .string

如果tag只有一个 `NavigableString` 类型子节点,那么这个tag可以使用 `.string` 得到子节点内容:

```python
for child in p_tag.children:
    print(child.string)
#打印输出
The Dormouse's story
```

## .strings 和 stripped_strings

如果tag中包含多个字符串,可以使用 `.strings` 来循环获取:

```
for string in soup.strings:
    print(string)
# 打印输出结果
The Dormouse's story


The Dormouse's story


Once upon a time there were three little sisters; and their names were
Elsie
,
Lacie
and
Tillie
;
and they lived at the bottom of a well.


...
```

输出的字符串中可能包含了很多空格或空行,使用 `.stripped_strings` 可以去除多余空白内容:

```python
for string in soup.stripped_strings:
    print(string)
# 输出结果
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters; and their names were
Elsie
,
Lacie
and
Tillie
;
and they lived at the bottom of a well.
...
```

本节内容不少了小伙伴们先消化一下,后续知识还不少,一点点学习。

