数据库
首页 > 数据库> > php – 如何将带有600MB xml文件的50MB zip文件放入mysql数据表中?

php – 如何将带有600MB xml文件的50MB zip文件放入mysql数据表中?

作者:互联网

如何将带有600MB xml文件(超过300,000“<”abc:ABCRecord“>”)的50MB zip文件放入mysql数据表中? xml文件本身具有以下结构:

<?xml version='1.0' encoding='UTF-8'?>
<abc:ABCData xmlns:abc="http://www.abc-example.com" xmlns:xyz="http:/www.xyz-example.com">
<abc:ABCHeader>
<abc:ContentDate>2015-08-15T09:03:29.379055+00:00</abc:ContentDate>
<abc:FileContent>PUBLISHED</abc:FileContent>
<abc:RecordCount>310598</abc:RecordCount>
<abc:Extension>
  <xyz:Sources>
    <xyz:Source>
      <xyz:ABC>5967007LIEEXZX4LPK21</xyz:ABC>
      <xyz:Name>Bornheim Register Centre</xyz:Name>
      <xyz:ROCSponsorCountry>NO</xyz:ROCSponsorCountry>
      <xyz:RecordCount>398</xyz:RecordCount>
      <xyz:ContentDate>2015-08-15T05:00:02.952+02:00</xyz:ContentDate>
      <xyz:LastAttemptedDownloadDate>2015-08-15T09:00:01.885686+00:00</xyz:LastAttemptedDownloadDate>
      <xyz:LastSuccessfulDownloadDate>2015-08-15T09:00:02.555222+00:00</xyz:LastSuccessfulDownloadDate>
      <xyz:LastValidDownloadDate>2015-08-15T09:00:02.555222+00:00</xyz:LastValidDownloadDate>
     </xyz:Source>
    </xyz:Sources>
   </abc:Extension>
 </abc:ABCHeader>
<abc:ABCRecords>
 <abc:ABCRecord>
 <abc:ABC>5967007LIEEXZX4LPK21</abc:ABC>
  <abc:Entity>
    <abc:LegalName>REGISTERENHETEN I Bornheim</abc:LegalName>
    <abc:LegalAddress>
      <abc:Line1>Havnegata 48</abc:Line1>
      <abc:City>Bornheim</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>8900</abc:PostalCode>
    </abc:LegalAddress>
    <abc:HeadquartersAddress>
      <abc:Line1>Havnegata 48</abc:Line1>
      <abc:City>Bornheim</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>8900</abc:PostalCode>
    </abc:HeadquartersAddress>
    <abc:BusinessRegisterEntityID register="Enhetsregisteret">974757873</abc:BusinessRegisterEntityID>
    <abc:LegalForm>Organisasjonsledd</abc:LegalForm>
    <abc:EntityStatus>Active</abc:EntityStatus>
  </abc:Entity>
  <abc:Registration>
    <abc:InitialRegistrationDate>2014-06-15T12:03:33.000+02:00</abc:InitialRegistrationDate>
    <abc:LastUpdateDate>2015-06-15T20:45:32.000+02:00</abc:LastUpdateDate>
    <abc:RegistrationStatus>ISSUED</abc:RegistrationStatus>
    <abc:NextRenewalDate>2016-06-15T12:03:33.000+02:00</abc:NextRenewalDate>
    <abc:ManagingLOU>59670054IEEXZX44PK21</abc:ManagingLOU>
  </abc:Registration>
</abc:ABCRecord>
<abc:ABCRecord>
  <abc:ABC>5967007LIE45ZX4MHC90</abc:ABC>
  <abc:Entity>
    <abc:LegalName>SUNNDAL HOSTBANK</abc:LegalName>
    <abc:LegalAddress>
      <abc:Line1>Sunfsalsvegen 15</abc:Line1>
      <abc:City>SUNNDALSPRA</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>6600</abc:PostalCode>
    </abc:LegalAddress>
    <abc:HeadquartersAddress>
      <abc:Line1>Sunndalsvegen 15</abc:Line1>
      <abc:City>SUNNDALSPRA</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>6600</abc:PostalCode>
    </abc:HeadquartersAddress>
    <abc:BusinessRegisterEntityID register="Foretaksregisteret">9373245963</abc:BusinessRegisterEntityID>
    <abc:LegalForm>Hostbank</abc:LegalForm>
    <abc:EntityStatus>Active</abc:EntityStatus>
  </abc:Entity>
  <abc:Registration>
    <abc:InitialRegistrationDate>2014-06-26T15:01:02.000+02:00</abc:InitialRegistrationDate>
    <abc:LastUpdateDate>2015-06-27T15:02:39.000+02:00</abc:LastUpdateDate>
    <abc:RegistrationStatus>ISSUED</abc:RegistrationStatus>
    <abc:NextRenewalDate>2016-06-26T15:01:02.000+02:00</abc:NextRenewalDate>
    <abc:ManagingLOU>5967007LIEEXZX4LPK21</abc:ManagingLOU>
  </abc:Registration>
</abc:ABCRecord>
</abc:ABCRecords>
</abc:ABCData>

mysql表是如何看起来的,我该如何实现?目标是在表中包含所有abc标记的内容.此外,每天都会有一个新的zip文件通过下载链接提供,它应该每天更新表格. zip文件以以下结构命名:“20150815-XYZ-concatenated-file.zip”.一步一步的提示会很棒吗?我试过这个:现在Importing XML file with special tags & namespaces <abc:xyz> in mysql,但它还没有完成工作!

根据下面的ThW解释,我现在做了以下事情:

<?php

// open input
$reader = new XMLReader();
$reader->open('./xmlreader.xml');

// open output
$output = fopen('./xmlreader.csv', 'w');
fputcsv($output, ['id', 'name']);

$xmlns = [
  'a' => 'http://www.abc-example.com'
];

// prepare DOM
$dom   = new DOMDocument;
$xpath = new DOMXpath($dom);
foreach ($xmlns as $prefix => $namespaceURI) {
  $xpath->registerNamespace($prefix, $namespaceURI);
}

// look for the first record element
while (
  $reader->read() && 
  (
    $reader->localName !== 'ABCRecord' || 
    $reader->namespaceURI !== $xmlns['a']
  )
) {
  continue;
}

// while you have an record element
while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    // expand record element node
    $node = $reader->expand($dom);
    // fetch data and write it to output
    fputcsv(
      $output, 
      [
        $xpath->evaluate('string(a:ABC)', $node),
        $xpath->evaluate('string(a:Entity/a:LegalName)', $node)
      ]
    );
  }

  // move to the next record sibling
  $reader->next('ABCRecord');
} 

这个对吗?!我在哪里可以找到输出?!我如何在mysql中获取输出.对不起我的菜鸟问题,这是我第一次这样做……

$dbHost = "localhost";
$dbUser = "root";
$dbPass = "password";
$dbName = "new_xml_extract";

$dbConn = mysqli_connect($dbHost, $dbUser, $dbPass, $dbName);

$delete = $dbConn->query("TRUNCATE TABLE `test_xml`");

....

$sql = "INSERT INTO `test_xml` (`.....`, `.....`)" . "VALUES ('". $dbConn->real_escape_string($.....) ."', '".$dbConn->real_escape_string($.....)."')";

$result = $dbConn->query($sql);
}

解决方法:

MySQL不知道您的XML结构.虽然它可以直接导入简单,格式良好的XML结构,但您需要自己转换更复杂的结构.您可以生成CSV,SQL或(支持的)XML.

对于大型文件,XMLReader是最好的API.首先创建一个实例并打开文件:

$reader = new XMLReader();
$reader->open('php://stdin');

你正在使用命名空间,所以我建议为它们定义一个映射数组:

$xmlns = [
  'a' => 'http://www.abc-example.com'
];

可以使用与XML文件中相同的前缀/别名,但您也可以使用自己的前缀/别名.

接下来遍历XML节点,直到找到第一个记录元素节点:

while (
  $reader->read() && 
  ($reader->localName !== 'ABCRecord' ||  $reader->namespaceURI !== $xmlns['a'])
) {
  continue;
}

您需要比较本地名称(没有名称空间前缀的标记名称)和名称空间URI.这样,您的程序不依赖于XML文件中的实际前缀.

找到第一个节点后,您可以使用相同的本地名称遍历到下一个兄弟节点.

while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    // read data for the record ...
  }      
  // move to the next record sibling
  $reader->next('ABCRecord');
}

您可以使用XMLReader来读取记录数据,但使用DOM和XPath表达式会更容易. XMLReader可以将当前节点扩展为DOM节点.因此,准备一个DOM文档,为它创建一个XPath对象并注册名称空间.扩展节点会将节点和所有后代加载到内存中,但不会加载父节点或兄弟节点.

$dom   = new DOMDocument;
$xpath = new DOMXpath($dom);
foreach ($xmlns as $prefix => $namespaceURI) {
  $xpath->registerNamespace($prefix, $namespaceURI);
}

while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    $node = $reader->expand($dom);
    var_dump(
      $xpath->evaluate('string(a:ABC)', $node),
      $xpath->evaluate('string(a:Entity/a:LegalName)', $node)
    );
  }
  $reader->next('ABCRecord');
}

DOMXPath :: evaluate()允许您使用Xpath表达式从DOM中获取标量值或节点列表.

fputcsv()可以很容易地将数据写入CSV.

放在一起:

// open input
$reader = new XMLReader();
$reader->open('php://stdin');

// open output
$output = fopen('php://stdout', 'w');
fputcsv($output, ['id', 'name']);

$xmlns = [
  'a' => 'http://www.abc-example.com'
];

// prepare DOM
$dom   = new DOMDocument;
$xpath = new DOMXpath($dom);
foreach ($xmlns as $prefix => $namespaceURI) {
  $xpath->registerNamespace($prefix, $namespaceURI);
}

// look for the first record element
while (
  $reader->read() && 
  (
    $reader->localName !== 'ABCRecord' || 
    $reader->namespaceURI !== $xmlns['a']
  )
) {
  continue;
}

// while you have an record element
while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    // expand record element node
    $node = $reader->expand($dom);
    // fetch data and write it to output
    fputcsv(
      $output, 
      [
        $xpath->evaluate('string(a:ABC)', $node),
        $xpath->evaluate('string(a:Entity/a:LegalName)', $node)
      ]
    );
  }

  // move to the next record sibling
  $reader->next('ABCRecord');
} 

输出:

id,name
5967007LIEEXZX4LPK21,"REGISTERENHETEN I Bornheim"
5967007LIE45ZX4MHC90,"SUNNDAL HOSTBANK"

标签:php,mysql,xml,zip,xml-namespaces
来源: https://codeday.me/bug/20191003/1849710.html