编程语言
首页 > 编程语言> > php – 在标记中包装字符串的每个字母,避免使用HTML标记

php – 在标记中包装字符串的每个字母,避免使用HTML标记

作者:互联网

我想构建一个函数,它接受一个字符串并将其每个字母包装在< span>中,除了空格和HTML标记(在我的情况下,< br>标记).

所以:

"Hi <br> there."

……应该成为

"<span>H</span><span>i</span> <br> <span>t</span><span>h</span><span>e</span><span>r</span><span>e</span><span>.</span>"

我没有运气想出自己的解决方案,所以我环顾四周,发现我很难找到我想要的东西.

我找到的最接近的是Neverever的回答here.

然而,它似乎没有那么好,因为< br>的每个角色都是如此.标签被包裹在< span>中它与éèàï等强调的角色不匹配.

我该怎么办呢?
为什么用正则表达式解析HTML标签似乎错了?

解决方法:

您可以考虑使用DOMDocument解析HTML并仅包含DOMText节点值内的字符.请参阅代码中的注释.

// Define source
$source = 'H&iuml; <br/> thérè.';

// Create DOM document and load HTML string, hinting that it is UTF-8 encoded.
// We need a root element for this so we wrap the source in a temporary <div>.
$hint = '<meta http-equiv="content-type" content="text/html; charset=utf-8">';
$dom = new DOMDocument();
$dom->loadHTML($hint . "<div>" . $source . "</div>");

// Get contents of temporary root node
$root = $dom->getElementsByTagName('div')->item(0);

// Loop through children
$next = $root->firstChild;
while ($node = $next) {
    $next = $node->nextSibling; // Save for next while iteration

    // We are only interested in text nodes (not <br/> etc)
    if ($node->nodeType == XML_TEXT_NODE) {
        // Wrap each character of the text node (e.g. "Hi ") in a <span> of
        // its own, e.g. "<span>H</span><span>i</span><span> </span>"
        foreach (preg_split('/(?<!^)(?!$)/u', $node->nodeValue) as $char) {
            $span = $dom->createElement('span', $char);
            $root->insertBefore($span, $node);
        }
        // Drop text node (e.g. "Hi ") leaving only <span> wrapped chars
        $root->removeChild($node);
    }
}

// Back to string via SimpleXMLElement (so that the output is more similar to
// the source than would be the case with $root->C14N() etc), removing temporary
// root <div> element and space-only spans as well.
$withSpans = simplexml_import_dom($root)->asXML();
$withSpans = preg_replace('#^<div>|</div>$#', '', $withSpans);
$withSpans = preg_replace('#<span> </span>#', ' ', $withSpans);

echo $withSpans, PHP_EOL;

输出:

<span>H</span><span>ï</span> <br/> <span>t</span><span>h</span><span>é</span><span>r</span><span>è</span><span>.</span>

标签:html,php,regex,word-wrap,preg-replace
来源: https://codeday.me/bug/20190829/1764718.html