java - JavaでHTMLを取得する方法

Question

外部ライブラリを使用せずに、Web サイトの HTML コンテンツを文字列に取得する最も簡単な方法は何ですか?

score 44 · Accepted Answer

私は現在これを使用しています：

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\\Z");
  content = scanner.next();
  scanner.close();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

しかし、より良い方法があるかどうかはわかりません。

score 21 · Accepted Answer

This has worked well for me:

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

Not sure at to whether the other solution(s) provided are any more efficient or not.

score 2 · Accepted Answer

この投稿を他のスレッドに残しましたが、上記のものも同様に機能する可能性があります. どちらか一方よりも簡単だとは思いません。import org.apache.commons.HttpClientコードの先頭でを使用するだけで、Apache パッケージにアクセスできます。

編集：リンクを忘れました;）

score 1 · Accepted Answer

バニラJavaではありませんが、より簡単なソリューションを提供します。Groovyを使用してください;-)

String siteContent = new URL("http://www.google.com").text

java - JavaでHTMLを取得する方法

5 に答える 5

Related

Reference