java - Java を使用して HTML ファイルを解析および変更する

Question

特定の HTML を解析し、そのコンテンツを変更して、変更したバージョンを保存する必要があります。

私のHTML入力：

<div>
<div class="post-text"><p>@MarcoS had an excellent solution using a NodeTraversor to make a list of nodes to change at <a href="https://stackoverflow.com/a/6594828/1861357">https://stackoverflow.com/a/6594828/1861357</a> and I only very slightly modified his method which replaces a node (a set of tags) with the data in the node plus whatever information you would like to add.</p>

<p>To store a String in memory I used a static <code>StringBuilder</code> to save the HTML in memory. </p>

<p>First we read in the HTML file (that is manually specified, this can be changed), then we make a series of checks to change whatever nodes with any data that we want.</p>

<p>The one problem that I didn't fix in the solution by MarcoS was that it split each individual word, instead of looking at a line. However I just used '-' for multiple words, because otherwise it places the string directly after that word.</p>

<p>So a full implementation: </p>
</div>
<div>
<div class="post-text" itemprop="description">

        <p>Recently I was recommended to use JSoup to parse and modify HTML documents. </p>

<p>However what if I have a HTML document that I want to modify (to send, store somewhere else, etc.), how might I go about doing that without changing the original document? </p>
</div>

私の問題は、「@MarcoS は、NodeTraversor を使用して変更するノードのリストを作成する優れたソリューションをhttps://stackoverflow.com/a/6594828/1861357で見つけ、私だけ」を html に配置してdiv tag(または何でも）その周り（親タグまたは段落全体の周りではありません）。検索するテキストには間に html タグがあります。

次のような出力が必要です。

 <div class="post-text"><p><div id="myDiv">@MarcoS had an excellent solution using a NodeTraversor to make a list of nodes to change at <a href="https://stackoverflow.com/a/6594828/1861357">https://stackoverflow.com/a/6594828/1861357</a> and I only</div>......</div>

RegEx が唯一の解決策ですか、それとも HTML パーサーでこれを実行できますか?

score 1 · Accepted Answer

XML パーサーを使用したくない場合は、regexp を使用してみてください。

String xmlStr = "some_xml";
xmlStr = xml.replaceAll(">\\s+<", "><").trim();

java - Java を使用して HTML ファイルを解析および変更する

1 に答える 1

Related

Reference