爬虫之线程池
作者:互联网
同步代码:
import requests import time headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } #flask服务器代码: from flask import Flask from time import sleep app = Flask(__name__) @app.route('/bobo') def index1(): sleep(2) return 'hello bobo!' @app.route('/jay') def index2(): sleep(2) return 'hello jay!' @app.route('/tom') def index3(): sleep(2) return 'hello tom!' app.run()
start = time.time() urls = [ 'http://127.0.0.1:5000/bobo', 'http://127.0.0.1:5000/jay', 'http://127.0.0.1:5000/tom', ] for url in urls: page_text = requests.get(url,headers=headers).text print(page_text) print(time.time()-start)
hello bobo!
hello jay!
hello tom!
6.016878366470337
异步代码
基于线程池实现异步爬取
from multiprocessing.dummy import Pool #线程池模块 #必须只可以有一个参数 def my_requests(url): return requests.get(url=url,headers=headers).text start = time.time() urls = [ 'http://127.0.0.1:5000/bobo', 'http://127.0.0.1:5000/jay', 'http://127.0.0.1:5000/tom', ] pool = Pool(3) #map:两个参数 #参数1:自定义的函数,必须只可以有一个参数 #参数2:列表or字典 #map的作用就是让参数1表示的自定义的函数异步处理参数2对应的列表或者字典中的元素 page_texes = pool.map(my_requests,urls) print(page_texes) print(time.time()-start)
['hello bobo!', 'hello jay!', 'hello tom!'] 2.0126171112060547
- asyncio
- 如何产生一个携程对象
- 什么是任务对象
- 任务对象和携程对象的区别
- 任务对象如何绑定一个回调呢
- 什么是事件循环呢?
- aiohttp
标签:bobo,5000,jay,tom,爬虫,线程,time,hello 来源: https://www.cnblogs.com/lulin9501/p/11303970.html