java - jsoupを使用してGoogleの結果を取得中に403エラーが発生しました

Question

次のコードを使用して Google の結果を取得しようとしています。

Document doc = con.connect("http://www.google.com/search?q=lakshman").timeout(5000).get();

しかし、私はこの例外を受け取ります:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403,URL=http://www.google.com/search?q=lakshman

403 エラーは、サーバーがアクセスを禁止していることを意味しますが、この URL を Web ブラウザーに問題なく読み込むことができます。Jsoup で 403 エラーが発生するのはなぜですか?

score 36 · Accepted Answer

次のように、UserAgent プロパティを HTTP ヘッダーに追加するだけです。

Jsoup.connect(itemUrl)
     .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
     .get()

score 6 · Accepted Answer

Google doesn't allow robots, you couldn't use jsoup to connect google. You can use the Google Web Search API (Deprecated) but the number of requests you may make per day will be limited.

score 1 · Accepted Answer

これを試して：

Document doc =con.connect("http://www.google.com/search?q=lakshman").ignoreHttpErrors(true).timeout(5000).get();

userAgent が機能しなかった場合は、私にとって機能しなかったのと同じように。

score 1 · Accepted Answer

場合によっては、リファラーを設定する必要があります。私の場合は助かりました。

完全なソースはこちら

    try{

        String strText = 
                Jsoup
                .connect("http://www.whatismyreferer.com")
                .referrer("http://www.google.com")
                .get()
                .text();

        System.out.println(strText);

    }catch(IOException ioe){
        System.out.println("Exception: " + ioe);
    }

java - jsoupを使用してGoogleの結果を取得中に403エラーが発生しました

6 に答える 6

Related

Reference