Python xml遍历问题和答案
作者:互联网
我将调查响应存储在xml中,但是不幸的是xml的构建不是统一的.
参见下面的xml.
我想遍历div,然后将所有< b>元素作为问题,但我不确定如何处理答案,因为有时它们会包含在子< div"有时不是. 我当时在考虑使用elementtree的互文或漂亮的汤.但是BeautifulSoup如果我执行soup.find_all(‘div’),则会返回所有div,包括内部的div. tree.itertext()可以工作,但是我不希望有太多嵌套循环. 有什么建议如何最好地处理这种情况?
<html>
<body>
<div>
<b>Question 1: What is your name?</b>
My name is Peter.
</div>
<div>
<b>Question 2: What is your native language?</b>
<div>Esperanto</div>
</div>
</body>
</html>
解决方法:
遍历顶级div,从b标记中提取问题文本,从下一个同级或下一个同级的下一个同级文本中提取答案:
from bs4 import BeautifulSoup
soup = BeautifulSoup("""
<html>
<body>
<div>
<b>Question 1: What is your name?</b>
My name is Peter.
</div>
<div>
<b>Question 2: What is your native language?</b>
<div>Esperanto</div>
</div>
</body>
</html>
""")
for div in soup.find('body').findAll('div', recursive=False):
question = div.find('b')
print question.text
print question.nextSibling.strip() or question.nextSibling.nextSibling.text.strip()
印刷品:
Question 1: What is your name?
My name is Peter.
Question 2: What is your native language?
Esperanto
标签:beautifulsoup,elementtree,xml,python,xml-parsing 来源: https://codeday.me/bug/20191030/1967441.html