12-爬取小说网实战
作者:互联网
import requests
import re
url = "http://m.pinsuu.com/paihang/nanpindushi/"
headers = {
"User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Mobile Safari/537.36 Edg/96.0.1054.43"
}
resp = requests.get(url, headers=headers)
resp.encoding = 'gb2312'
html = resp.text # 获取页面源代码
obj = re.compile(r'.*?<span class="nm">(?P<name>.*?) <font size="0.5rem" color="#999999">(?P<status>.*?)</font></span>.*?<span><font color="#3ca5f6">(?P<nanzhu>.*?)</font></span>.*?<span><font color="#FF00D2">(?P<nvzhu>.*?)</font></span>', re.S)
result = obj.finditer(html)
for item in result:
print(item.group("name") + "---" + item.group("status") + "---" + item.group("nanzhu") + "---" + item.group("nvzhu"))
resp.close()
/
/
/
/
我们可以通过csv将我们爬取的数据进行保存,方便日后进行分析
标签:---,12,group,re,resp,爬取,item,nbsp,小说网 来源: https://www.cnblogs.com/morehair/p/15677307.html