编程语言
首页 > 编程语言> > Python-使用Python熊猫将xml转换为csv

Python-使用Python熊猫将xml转换为csv

作者:互联网

我是这里的新手,我一直在尝试创建一个小的python脚本以将xml转换为csv.根据我在Stackoverflow上阅读的各种文章,我设法提出了一个可以正常工作的示例代码.但是,我尝试使用的数据具有多个层次,因此我不确定如何在叶中提取数据水平.

以下是数据的外观:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Transmission>
    <TransmissionBody>
        <level1>
            <level2>
                <level3>
                    <level4>
                        <level5>
                            <level6>
                                <ColA>ABC</ColA>
                                <ColB>123</ColB>
                            </level6>
                        </level5>
                    </level4>
                </level3>
            </level2>
        </level1>
    </TransmissionBody>
</Transmission>

我正在尝试使用以下代码尝试将xml转换为csv

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()
final = {}
for elem in root:
    if len(elem):
        for c in elem.getchildren():
            final[c.tag] = c.text
    else:
        final[elem.tag] = elem.text

df = pd.DataFrame([final])
df.to_csv('file.csv)

但是,此代码仅从level6中提取level2而不是ColA.

预期产量:

Transmission,TransmissionBody,level1,level2,level3,level4,level5,level6,ColA,ColB
,,,,,,,,ABC,123
,,,,,,,,DEF,456

更新的代码:

allFiles = glob.glob(folder)
for file in allFiles:
    xmllist = [file]
    for xmlfile in xmllist:
        tree = ET.parse(xmlfile)
        root = tree.getroot()

        def f(elem, result):
            result[elem.tag] = elem.text
            cs = elem.getchildren()
            for c in cs:
                result = f(c, result)
            return result

         d = f(root, {})
         df = pd.DataFrame(d, index=['values'])

解决方法:

如果我正确理解了您的问题,则需要遍历XML树,因此您可能希望具有一个执行此操作的递归函数.类似于以下内容:

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()

def f(elem, result):
    result[elem.tag] = elem.text
    cs = elem.getchildren()
    for c in cs:
        result = f(c, result)
    return result

d = f(root, {})
df = pd.DataFrame(d, index=['values']).T
df

出:

    values
Transmission    \n
TransmissionBody    \n
level1  \n
level2  \n
level3  \n
level4  \n
level5  \n
level6  \n
ColA    ABC
ColB    123

更新:
这是我们需要在多个XML文件上进行处理的时候.我添加了与原始文件类似的另一个文件,其中ColA替换为ColB行

<ColA>DEF</ColA>
<ColB>456</ColD>

这是代码:

def f(elem, result):
    result[elem.tag] = elem.text
    cs = elem.getchildren()
    for c in cs:
        result = f(c, result)
    return result

result = {}
for file in glob.glob('*.xml'):
    tree = ET.parse(file)
    root = tree.getroot()
    result = f(root, result)

df = pd.DataFrame(result, index=['values']).T
df

并输出:

                    0    1
Transmission       \n   \n
TransmissionBody   \n   \n
level1             \n   \n
level2             \n   \n
level3             \n   \n
level4             \n   \n
level5             \n   \n
level6             \n   \n
ColA              ABC  DEF
ColB              123  456

标签:export-to-csv,pandas,xml,python
来源: https://codeday.me/bug/20191108/2006296.html