java - スキャナーを使用した特定のデータマイニング

Question

Web サイトからページソースを取得し、コードのスニペットのみを保存するプログラムを構築しようとしています。

package Program;

import java.net.*;
import java.util.*;

public class Program {
public static void main(String[] args) {
    String site = "http://www.amazon.co.uk/gp/product/B00BE4OUBG/ref=s9_ri_gw_g63_ir01?pf_rd_m=A3P5ROKL5A1OLE&pf_rd_s=center-5&pf_rd_r=0GJRXWMKNC5559M5W2GB&pf_rd_t=101&pf_rd_p=394918607&pf_rd_i=468294";
    try {
        URL url = new URL(site);
        URLConnection connection = url.openConnection();
        connection.connect();
        Scanner in = new Scanner(connection.getInputStream());
        while (in.hasNextLine()) {
            System.out.println(in.nextLine());
        }
    } catch (Exception e) {
        System.out.println(e);
    }
}
}

これまでのところ、これは出力にコードを表示するだけです。プログラムで特定の文字列を検索し、価格のみを表示するようにしたいと考えています。例えば

<tr id="actualPriceRow">
<td id="actualPriceLabel" class="priceBlockLabelPrice">Price:</td>
<td id="actualPriceContent"><span id="actualPriceValue"><b class="priceLarge">£599.99</b></span>
<span id="actualPriceExtraMessaging">

class="priceLarge">599.99 を検索して表示/保存する

Web サイトに同様の質問があることは知っていますが、PHP をよく理解していないため、Java ソリューションが必要ですが、どのようなソリューションも歓迎します :)

score 0 · Accepted Answer

OPは質問編集で書いた：

回答ありがとうございました。本当に役に立ちました。回答は次のとおりです。

package Project;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Project {

/**
 * @param args the command line arguments
 */
public static void main(String[] args) {

    Document doc;
    try {
        doc = Jsoup.connect("url of link").get();
        String title = doc.title();
        System.out.println("title : " + title);
        String pricing = doc.getElementsByClass("priceLarge").text();
        String str = pricing;
        str = str.substring(1);
        System.out.println("price : " + str);
    } catch (Exception e) {
        System.out.println(e);
    }
}
}

score 0 · Accepted Answer

たとえば、解析にいくつかのライブラリを使用できます。スープ

Document document = Jsoup.connect("http://www.amazon.co.uk/gp/product/B00BE4OUBG/ref=s9_ri_gw_g63_ir01?pf_rd_m=A3P5ROKL5A1OLE&pf_rd_s=center-5&pf_rd_r=0GJRXWMKNC5559M5W2GB&pf_rd_t=101&pf_rd_p=394918607&pf_rd_i=468294").get();

次に、具体的な要素を検索できます

Elements el = document.select("b.priceLarge");

そして、この要素のコンテンツを次のように取得できます

String content = el.val();

java - スキャナーを使用した特定のデータマイニング

2 に答える 2

Related

Reference