首页 > 编程语言> > python 提取网页源码中注释内容非常规方法

python 提取网页源码中注释内容非常规方法

2020-01-22 09:39:11 作者：互联网

如下：

如果我们需要提取中的内容，通过BeautifulSoup方法会直接忽略中的内容

这时我们可以对网页源码作替换处理，将<!-- 字符串全部替换成空

res3 = requests.get(url,headers=headers,timeout=(10,60)).content

html1 = res3
html =eval(repr(html1 ).replace('<!-- ', '')) #此句为替换源码中网页注释部分
soup = BeautifulSoup(html, 'html.parser')

这样通过BeautifulSoup方法去查找 span class="flag"

stkj007 发布了3 篇原创文章 · 获赞 1 · 访问量 637 私信关注

标签：非常规,网页,headers,python,html,BeautifulSoup,html1,源码
来源： https://blog.csdn.net/stkj007/article/details/104067626