java - HTMLparser の HasAttributeFilter パラメータでワイルドカード (または正規表現) を使用する

Question

org.htmlparser を使用しています。クラスマスクでノードリストを受け取るにはどうすればよいですか? 例:

<span class="selection-link normal coeff816128@Result.draw">....</span>
<span class="selection normal coefd816154@Result.draw">....</span>

クラスとして「通常」を持つすべてのタグを受け取りたいです。不運にも

new HasAttributeFilter("クラス", "ノーマル")

動作しない。HTMLparser は次のようなものを許可されていnew HasAttributeFilter("class", "\*normal*")ますか?

score 0 · Accepted Answer

可能であれば、非常に有能なオープンソースの HTML ライブラリであるjsoupを試すことができます。

クラスとして通常の各要素を取得 (および出力) する方法の例を次に示します。

入力 HTML:

<span class="selection-link normal coeff816128@Result.draw">....</span>
<span class="selection-link coeff816128@Result.draw">....</span>
<span class="selection coefd816154@Result.draw">....</span>
<span class="selection normal coefd816154@Result.draw">....</span>

（それはあなたのものですが、クラスspanを持たない2つの余分なものがあります）normal

スープ:

/* Input file - containing the html listed above.*/
final File f = new File("test.html");

/*
 * Parse the html into a jsoup document. In this example i get it from
 * the file, but its possible to parse from string or connect to a
 * website.
 */
Document doc = Jsoup.parse(f, null);


/* Iterate over eacht element */
for( Element element : doc.select("*.normal") )
{
    System.out.println(element);
}

*.normalclass を持つすべての要素を選択しますnormal。spanただし、タグが付いているものだけを使用しない場合は、span.normal代わりに使用してください。

Jsoup セレクター API のドキュメントについては、http://jsoup.org/cookbook/extracting-data/selector-syntax を参照してください。

ところで。代わりに DOM セレクターを使用する場合select():doc.getElementsByClass("normal")

java - HTMLparser の HasAttributeFilter パラメータでワイルドカード (または正規表現) を使用する

1 に答える 1

Related

Reference