jsoup - JSoupを使用したHTMLの解析

Question

このNASAページの説明、ページ下部のテキストを解析したい

これどうやってするの？

score 0 · Accepted Answer

最初にページに接続し、これを解析してDocument（Jsoupをインポートするようにしてください）、次にSelectorAPIを使用して必要なものを選択できます。

次に例を示します。

// Connect to page and parse html into a 'Document'
Document doc = Jsoup.connect("http://photojournal.jpl.nasa.gov/catalog/PIA16465").get();


for( Element element : doc.select("p") )    // Select all 'p'-Tags and loop over them
{
    if( element.hasText() )                 // Check if the element has text (since there are some empty too)
    {
        System.out.println(element.text()); // print the element's text
    }
}

- 編集 -

for( Element element : doc.select("dd p") ) // Or: "dd > p"
{
    if( element.hasText() )
    {
        System.out.println(element.text());
        break;
    }
}

ループの代わりに、次のようなものを使用できます。

Element firstTag = doc.select("dd p").first();

これにより、ddタグの後に最初のpタグが表示されます。しかし、これに一致する空のpタグが多数あるため、ここでは機能しません。ただし、これを解決するために正規表現セレクター（上記のリンクを参照）を使用できますが、最初のループははるかに理解しやすいです。

jsoup - JSoupを使用したHTMLの解析

1 に答える 1

- 編集 -

Related

Reference