编程语言
首页 > 编程语言> > Python xml遍历问题和答案

Python xml遍历问题和答案

作者:互联网

我将调查响应存储在xml中,但是不幸的是xml的构建不是统一的.
参见下面的xml.

我想遍历div,然后将所有< b>元素作为问题,但我不确定如何处理答案,因为有时它们会包含在子< div"有时不是. 我当时在考虑使用elementtree的互文或漂亮的汤.但是BeautifulSoup如果我执行soup.find_all(‘div’),则会返回所有div,包括内部的div. tree.itertext()可以工作,但是我不希望有太多嵌套循环. 有什么建议如何最好地处理这种情况?

 <html>
 <body>
  <div>
   <b>Question 1: What is your name?</b>
   My name is Peter.
  </div>
  <div>
   <b>Question 2: What is your native language?</b>
   <div>Esperanto</div>
  </div>
 </body>
</html>

解决方法:

遍历顶级div,从b标记中提取问题文本,从下一个同级或下一个同级的下一个同级文本中提取答案:

from bs4 import BeautifulSoup

soup = BeautifulSoup("""
<html>
 <body>
  <div>
   <b>Question 1: What is your name?</b>
   My name is Peter.
  </div>
  <div>
   <b>Question 2: What is your native language?</b>
   <div>Esperanto</div>
  </div>
 </body>
</html>
""")

for div in soup.find('body').findAll('div', recursive=False):
    question = div.find('b')
    print question.text
    print question.nextSibling.strip() or question.nextSibling.nextSibling.text.strip()

印刷品:

Question 1: What is your name?
My name is Peter.
Question 2: What is your native language?
Esperanto

标签:beautifulsoup,elementtree,xml,python,xml-parsing
来源: https://codeday.me/bug/20191030/1967441.html