PHP DomDocument-getElementByID(部分匹配)如何?
作者:互联网
有没有办法获取ID部分匹配的所有元素.例如,如果我想使用id属性(以msg_开头,但之后可以是其他任何内容)来捕获网页上的所有HTML元素.
这是我到目前为止的事情:
$doc = new DomDocument;
// We need to validate our document before refering to the id
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('{URL IS HERE}'));
foreach($doc->getElementById('msg_') as $element) {
foreach($element->getElementsByTagName('a') as $link)
{
echo $link->nodeValue . "\n";
}
}
但是我需要弄清楚如何使用此位进行部分ID匹配:$doc-> getElementById(‘msg_’)或是否还有其他方法可以实现此目标… ??
基本上,我需要抓取所有ID为msg_的元素的子元素中的所有“ a”标签.从技术上讲,总是只有1个标签,但我不知道如何仅抓取第一个Child,这就是为什么我也在此上使用foreach的原因.
DomDocument PHP类是否可能?
这是我现在正在使用的代码,该代码也不起作用:
$str = '';
$filename = 'http://dream-portal.net/index.php/board,65.0.html';
@set_time_limit(0);
$fp = fopen($filename, 'rb');
while (!feof($fp))
{
$str .= fgets($fp, 16384);
}
fclose($fp);
$doc = new DOMDocument();
$doc->loadXML($str);
$selector = new DOMXPath($doc);
$elements = $selector->query('//row[starts-with(@id, "msg_")]');
foreach ($elements as $node) {
var_dump($node->nodeValue) . PHP_EOL;
}
HTML如下(位于span标记中):
<td class="subject windowbg2">
<div>
<span id="msg_6555">
<a href="http://dream-portal.net/index.php?topic=834.0">Poll 1.0</a>
</span>
<p>
Started by
<a href="http://dream-portal.net/index.php?action=profile;u=1" title="View the profile of SoLoGHoST">SoLoGHoST</a>
<small id="pages6555">
«
<a class="navPages" href="http://dream-portal.net/index.php?topic=834.0">1</a>
<a class="navPages" href="http://dream-portal.net/index.php?topic=834.15">2</a>
»
</small>
with 963 Views
</p>
</div>
</td>
这是< span id =“ msg_”部分,其中有很多(HTML页面上至少15个).
解决方法:
用这个:
$str = file_get_contents('http://dream-portal.net/index.php/board,65.0.html');
$doc = new DOMDocument();
@$doc->loadHTML($str);
$selector = new DOMXPath($doc);
foreach ($selector->query('//*[starts-with(@id, "msg_")]') as $node) {
var_dump($node->nodeValue) . PHP_EOL;
}
给你:
string(8) "Poll 1.0"
string(12) "Shoutbox 2.2"
string(24) "Polaroid Attachments 1.6"
string(24) "Featured News Slider 1.3"
string(17) "Image Resizer 1.0"
string(8) "Blog 2.2"
string(13) "RSS Feeds 1.0"
string(19) "Adspace Manager 1.2"
string(21) "Facebook Like Box 1.0"
string(15) "Price Table 1.0"
string(13) "SMF Links 1.0"
string(19) "Download System 1.2"
string(16) "[*]Site News 1.0"
string(12) "Calendar 1.3"
string(16) "Page Peel Ad 1.1"
string(20) "Sexy Bookmarks 1.0.1"
string(15) "Forum Staff 1.2"
string(21) "Facebook Comments 1.0"
string(15) "Attachments 1.4"
string(25) "YouTube Channels 0.9 Beta"
标签:getelementbyid,domdocument,php 来源: https://codeday.me/bug/20191031/1971928.html