正则表达式与DOMDocument in PHP
作者:互联网
考虑以下PHP代码段:
<?php
$html = <<<DATA
<p>Lorem Ipsum is simply dummy text</p> <p>Lorem Ipsum is <a href="http://www.google.com">simply</a> dummy text</p><a href="http://www.youtube.com/watch?v=DUQi_R4SgWo" target="_blank" rel="noopener">Check out the video here!</a>. <p>Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.</p> <a href="http://www.youtube.com/watch?v=A_6gNZCkajU" target="_blank" rel="noopener">Video here</a> <p>It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
DATA;
# set up the DOM
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
# set up the xpath
$xpath = new DOMXPath($dom);
# set up the regex
$regex = '~\?v=([^&]+)~';
foreach ($xpath->query("a[contains(@href, 'youtube')]/@href") as $link) {
preg_match($regex, $link->nodeValue, $matches);
if ($matches) {
$id = $matches[1];
echo "$id\n";
}
}
?>
这将在HTML字符串上设置DOM,然后借助xpath查询和正则表达式获取YouTube链接.
摘录内容
DUQi_R4SgWo
A_6gNZCkajU
现在,我想将foreach循环替换为:
$regex = '~\?v=([^&]+)~';
$xpath->registerPHPFunctions();
$xpath->registerNamespace("php", "http://php.net/xpath");
$links = $xpath->query("a[php:functionString('preg_match', '$regex', href, '$matches')]/@href");
这会找到相同的链接,但不会将任何内容保存到$matches中-为什么?
解决方法:
快速扫描underlying engine code:它不支持通过引用.
要解决此问题,请使用自己的包装器:
$xpath->registerNamespace('php', 'http://php.net/xpath');
$xpath->registerPHPFunctions('match');
$links = $xpath->query("a[php:functionString('match', @href)]/@href");
function match($href) {
$regex = '~\?v=([^&]+)~';
$rc = preg_match($regex, $href, $matches);
var_dump($matches[1]); // store this somewhere
return $rc;
}
标签:domdocument,php,regex 来源: https://codeday.me/bug/20191110/2014840.html