编程语言
首页 > 编程语言> > java – 围绕普通html文本换行标记

java – 围绕普通html文本换行标记

作者:互联网

我的html文档中有这个结构:

<p>
"<em>You</em> began the evening well, Charlotte," said Mrs.&nbsp;Bennet with civil          self–command to Miss Lucas. "<em>You</em> were Mr.&nbsp;Bingley's first choice."
</p>

但我需要将我的“纯文本”包​​含在标签中,以便能够处理它:)

<p>
    <text>"</text>
    <em>You</em>
    <text> began the evening well, Charlotte," said Mrs.&nbsp;Bennet with civil self–command to Miss Lucas. "</text>
    <em>You</em>
    <text> were Mr.&nbsp;Bingley's first choice."</text>
</p>

任何想法如何实现这一目标?我看过tagsoup和jsoup,但我似乎不太容易解决这个问题.也许使用一些花哨的正则表达式.

谢谢

解决方法:

这是一个建议:

public static Node toTextElement(String str) {
    Element e = new Element(Tag.valueOf("text"), "");
    e.appendText(str);
    return e;
}

public static void replaceTextNodes(Node root) {
    if (root instanceof TextNode)
        root.replaceWith(toTextElement(((TextNode) root).text()));
    else
        for (Node child : root.childNodes())
            replaceTextNodes(child);
}

测试代码:

String html = "<p>\"<em>You</em> began the evening well, Charlotte,\" " +
         "said Mrs.&nbsp;Bennet with civil self–command to Miss Lucas." +
         " \"<em>You</em> were Mr.&nbsp;Bingley's first choice.\"</p>";

Document doc = Jsoup.parse(html);

for (Node n : doc.body().children())
    replaceTextNodes(n);

System.out.println(doc);

输出:

<html>
 <head></head>
 <body>
  <p>
   <text>
    &quot;
   </text><em>
    <text>
     You
    </text></em>
   <text>
     began the evening well, Charlotte,&quot; said Mrs.&nbsp;Bennet with civil self–command to Miss Lucas. &quot;
   </text><em>
    <text>
     You
    </text></em>
   <text>
     were Mr.&nbsp;Bingley's first choice.&quot;
   </text></p>
 </body>
</html>

标签:text-parsing,java,regex,jsoup,tag-soup
来源: https://codeday.me/bug/20190902/1790187.html