首页 > 编程语言> > python爬虫之协程使用

python爬虫之协程使用

2021-01-28 20:06:02 作者：互联网

1.设置headers

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/81.0.4044.138 Safari/537.36"}

2.对爬取函数使用async修饰

async def job(url,Year,Month,Day,Hour):
    async with aiohttp.ClientSession() as session:
        content = await fetch(session,url)
        soup = BeautifulSoup(content,'lxml')
        page_url = soup.select('a')[0]['href'][3:]
        txt_url = request_url+page_url
        try:
            txt_response = requests.get(txt_url,headers=headers)
        except:
            txt_response = requests.get(txt_url,headers=headers)
        txt_response.encoding = 'utf-8'
        soup = BeautifulSoup(txt_response.content,'lxml')
        #解析部分

fetch方法同样要用async修饰

async def fetch(session,url):
    async with session.get(url) as response:
        return await response.text()

3.开始爬取

首先对需要爬取的url存入列表

for alt in alts:
    print(alt)
    url = get_url(Year,month,day,hour,alt)
    urls.append(url)

随后使用协程开始爬取

loop = asyncio.get_event_loop()
tasks = [job(url,Year,month,day,hour) for url in urls]
loop.run_until_complete(asyncio.wait(tasks))

标签：之协程,headers,python,get,爬虫,url,async,txt,response
来源： https://blog.csdn.net/weixin_38828673/article/details/113356867