其他分享
首页 > 其他分享> > 豆瓣图书短评爬取(其中一本书的短评<前十页>)

豆瓣图书短评爬取(其中一本书的短评<前十页>)

作者:互联网

原文章在我的csdn上:https://blog.csdn.net/Thefreelittle/article/details/117574096

 

 

```python
import requests
from bs4 import BeautifulSoup
import time
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
print("豆瓣图书爬取---流浪地球。")
num = 1
for i in range (0,199,20):
time.sleep(3)
if i == 0:
url = 'https://book.douban.com/subject/3266609/comments/?limit=20&status=P&sort=new_score'
else:
url = 'https://book.douban.com/subject/3266609/comments/?start='+str(i)+'&limit=20&status=P&sort=new_score'
resp = requests.get(url, headers=headers)
bs=BeautifulSoup(resp.text,'html.parser')
grid_view=bs.find_all('li',class_="comment-item")#里面的每个li表示一个影片数据
print("------------------第"+str(num) +"页评论信息爬取。输出样例(点赞数、用户名称、评论时间、评论内容)------------------")
cishu = 1
for item in grid_view:
piaoshu = item.find('span',class_="vote-count").text
tzuozhe = item.find('span',class_="comment-info")
zuozhe = tzuozhe.find('a').text
shijian = item.find('span',class_="comment-time").text
comment = item.find('span',class_="short").text

ping = tzuozhe.find('span')
if len(str(ping)) != 60:
pingfen = "5个星"
else:
if ping.get('title') == "还行":
pingfen = "3个星"
elif ping.get('title') == "力荐":
pingfen = "5个星"
elif ping.get('title') == "推荐":
pingfen = "4个星"
elif ping.get('title') == "较差":
pingfen = "2个星"
else:
pingfen = "1个星"

print("第"+str(num)+"页的第"+str(cishu)+"条评论---"+"点赞数:"+str(piaoshu)+" 作者名称:"+str(zuozhe)+" 评论时间:"+str(shijian)+" 评分:"+pingfen+" 评论内容:"+str(comment)+"\n")
cishu += 1
num += 1

```

标签:短评,个星,前十页,ping,爬取,item,pingfen,str,find
来源: https://www.cnblogs.com/dazhi151/p/14911220.html