python爬虫之协程使用
作者:互联网
1.设置headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/81.0.4044.138 Safari/537.36"}
2.对爬取函数使用async修饰
async def job(url,Year,Month,Day,Hour):
async with aiohttp.ClientSession() as session:
content = await fetch(session,url)
soup = BeautifulSoup(content,'lxml')
page_url = soup.select('a')[0]['href'][3:]
txt_url = request_url+page_url
try:
txt_response = requests.get(txt_url,headers=headers)
except:
txt_response = requests.get(txt_url,headers=headers)
txt_response.encoding = 'utf-8'
soup = BeautifulSoup(txt_response.content,'lxml')
#解析部分
fetch方法同样要用async修饰
async def fetch(session,url):
async with session.get(url) as response:
return await response.text()
3.开始爬取
首先对需要爬取的url存入列表
for alt in alts:
print(alt)
url = get_url(Year,month,day,hour,alt)
urls.append(url)
随后使用协程开始爬取
loop = asyncio.get_event_loop()
tasks = [job(url,Year,month,day,hour) for url in urls]
loop.run_until_complete(asyncio.wait(tasks))
标签:之协程,headers,python,get,爬虫,url,async,txt,response 来源: https://blog.csdn.net/weixin_38828673/article/details/113356867