java - JSoupforJavaを使用してWebページから特定の行を抽出します

Question

こんにちは私はJSoupライブラリを使用してウェブサイトからいくつかのテキストを削りたいです。次のコードを試しましたが、Webページ全体が表示されるので、特定の行を抽出したいと思います。これが私が使用しているコードです：

Document doc = null;
try {
doc = Jsoup.connect("http://www.example.com").get();
} catch (IOException e) {
e.printStackTrace();
}
String text = doc.html();

System.out.println(text);

次のように出力されます

<html>
 <head></head>
 <body>
  Martin,James,28,London,20k
  <br /> Sarah,Jackson,43,Glasgow,32k
  <br /> Alex,Cook,22,Liverpool,18k
  <br /> Jessica,Adams,34,London,27k
  <br /> 
 </body>
</html>

読み取りの6行目だけを抽出してAlex,Cook,22,Liverpool,18k、各要素がコンマの前の単語である配列に入れるにはどうすればよいですか（例：[0] = Alex、[1] = Cookなど）

score 1 · Accepted Answer

たぶん、結果を少しフォーマット（？）する必要があります：

    Document doc = Jsoup.connect("http://www.example.com").get();
    int count = 0; // Count Nodes

    for( Node n : doc.body().childNodes() )
    {
        if( n instanceof TextNode )
        {
            if( count == 2 ) // Node 'Alex'
            {
                String t[] = n.toString().split(","); // you have an array with each word as string now

                System.out.println(Arrays.toString(t)); // eg. output
            }
            count++;
        }
    }

出力：

[ Alex, Cook, 22, Liverpool, 18k ]

編集：

TextNodeそのccntentによって'sを選択することはできないので（ Elementsでのみ可能）、小さな回避策が必要です。

for( Node n : doc.body().childNodes() )
{
    if( n instanceof TextNode )
    {
        str = n.toString().trim();

        if( str.toLowerCase().startsWith("alex") ) // Node 'Alex'
        {
            String t[] = n.toString().split(","); // you have an array with each word as string now

            System.out.println(Arrays.toString(t)); // eg. output
        }
    }
}

java - JSoupforJavaを使用してWebページから特定の行を抽出します

1 に答える 1

編集：

Related

Reference