爬虫工具|输入浏览器header内容字符串,自动格式化为字典类型
作者:互联网
背景
一般我们进行网络资源的爬取操作时,通常需要把浏览器中的request header的内容复制出来,放在脚本中进行操作。
通常我们是手动在每个key和value的两边都加上'',但是这种方法比较麻烦,且比较耗时,以下为输入一段浏览器header内容字符串,自动格式化为字典类型的方法。
代码实现
def get_headers(input_headers_string):
'''
自动格式化爬虫浏览器请求头参数,输入一个从浏览器中复制过来的请求头,自动转换为字典格式内容,一键粘贴为headers即可
:param input_headers_string:str,从浏览器中复制过来的请求头,例如: headers = """
Host: zhan.qq.com
Proxy-Connection: keep-alive
Content-Length: 799432
Pragma: no-cache
Cache-Control: no-cache
"""
:return:
'''
# 使用三引号将浏览器复制出来的requests headers参数赋值给一个变量
headers =str(input_headers_string)
# 去除参数头尾的空格并按换行符分割
headers = headers.strip().split('\n')
# 使用字典生成式将参数切片重组,并去掉空格,处理带协议头中的://
headers = {x.split(':')[0].strip(): ("".join(x.split(':')[1:])).strip().replace('//', "://") for x in headers}
# 使用json模块将字典转化成json格式打印出来
return_headers=json.dumps(headers, indent=1)
print('headers={}'.format(return_headers))
return
代码调用
if __name__ == '__main__':
headers = """
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
content-length: 14
content-type: application/x-www-form-urlencoded; charset=UTF-8
origin: https://www.2ppt.com
referer: https://www.2ppt.com/so/1.html
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36
x-requested-with: XMLHttpRequest
"""
get_headers(headers)
运行结果
headers={
"accept-encoding": "gzip, deflate, br",
"accept-language": "zh-CN,zh;q=0.9",
"content-length": "14",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"origin": "https://www.2ppt.com",
"referer": "https://www.2ppt.com/so/1.html",
"sec-ch-ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"96\", \"Google Chrome\";v=\"96\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
"x-requested-with": "XMLHttpRequest"
}
标签:ch,浏览器,headers,爬虫,header,sec,ua,fetch 来源: https://blog.csdn.net/zh6526157/article/details/121947884