java - JSoup による HTML 解析

Question

次の URL の html を解析しようとしています。

http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/

インストラクターの名前を含む「< p >」タグのテキストを取得します。必要な情報は「< p >」タグ内にありますが、JSoup を使用してタグを取得できません。Elementオブジェクトにタグを保存すると、それを「b」と呼び、b.getAllElements()を呼び出しても表示されないため、何が間違っているのかわかりません

要素の一つとして。それは Jsoup の getAllElements() メソッドが行うことではないでしょうか。そうでない場合は、パーサーが場所を特定できないため、明らかに欠落している階層を説明してください。

この場合は "Prof. Zoltan Spakovszky" という、必要なテキストを含むタグです。

どんな助けでも大歓迎です。

public void getHomePageLinks()
{
    String html = "http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/";
    org.jsoup.nodes.Document doc = Jsoup.parse(html);

    Elements bodies = doc.select("body");

    for(Element body : bodies )
    {
        System.out.println(body.getAllElements());
    }

}

出力は次のとおりです。

http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/

ドキュメントの body タグ内のすべての要素を出力することになっていませんか?

score 3 · Accepted Answer

JSoupについては何も知りませんが、インストラクターの名前が必要な場合は、次のような名前でアクセスできるようです。

Element instructor = doc.select("div.chpstaff div p");

score 3 · Accepted Answer

あなたはすでに解決しているかもしれませんが、私はそれに取り組んでいるので、提出するのをためらうことはできません

import java.io.IOException;
import java.util.logging.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
public class JavaApplication17 {

public static void main(String[] args) {

try {
   String url = "http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-   fall-2002/";
  Document doc = Jsoup.connect(url).get();
  Elements paragraphs = doc.select("p");
  for(Element p : paragraphs)
    System.out.println(p.text());

} 
catch (IOException ex) {
  Logger.getLogger(JavaApplication17.class.getName())
        .log(Level.SEVERE, null, ex);
   }
  }
}

is it what u meant?

score 2 · Accepted Answer

簡単な例を次に示します。

// Connect to the website and parse it into a document
Document doc = Jsoup.connect("http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/").get();

// Select all elements you need (se below for documentation)
Elements elements = doc.select("div[class=chpstaff] p");

// Get the text of the first element
String instructor = elements.first().text();

// eg. print the result
System.out.println(instructor);

jsoupセレクターAPIのドキュメントをここで見てください：Jsoupコードブック
使用するのはそれほど難しくありませんが、非常に強力です。

score 1 · Accepted Answer

これがコードです

Document document = Jsoup.connect("http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/").get();

        Elements elements = document.select("p");
        System.out.println(elements.html());

JsoupのSelectorプロパティを使用して、すべてのタグを選択できます。のテキストとタグを返します

。

score 0 · Accepted Answer

        Elements ele=doc.select("p");
      ' String text=ele.text();
        System.out.println(text);

これを試してください私はそれがうまくいくと思います

java - JSoup による HTML 解析

5 に答える 5

Related

Reference