编程语言
首页 > 编程语言> > python-BeautifulSoup找不到标签

python-BeautifulSoup找不到标签

作者:互联网

我正在尝试抓取this page和所有其他类似的页面.我一直在使用BeautifulSoup(也尝试过lxml,但存在安装问题).我正在使用以下代码:

value = "http://www.presidency.ucsb.edu/ws/index.php?pid=99556"
desiredTag = "span"
r = urllib2.urlopen(value)
data = BeautifulSoup(r.read(), 'html5lib') 
displayText = data.find_all(desiredTag)
print displayText
displayText = " ".join(str(displayText))
displayText = BeautifulSoup(displayText, 'html5lib')

由于某些原因,这不会撤回< span class =“ displaytext”>而且我也尝试了desireTag作为p

我想念什么吗?

解决方法:

您肯定正在经历BeautifulSoup使用的between different parsers差异. html.parser和lxml为我工作:

data = BeautifulSoup(urllib2.urlopen(value), 'html.parser') 

证明:

>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> 
>>> url = "http://www.presidency.ucsb.edu/ws/index.php?pid=99556"
>>> 
>>> data = BeautifulSoup(urllib2.urlopen(url), 'html.parser')
>>> data.find("span", class_="displaytext").text
u'PARTICIPANTS:Former Speaker of the House Newt Gingrich (GA);
...

标签:beautifulsoup,web-scraping,python
来源: https://codeday.me/bug/20191120/2044944.html