python - 1回の接続でWebサイト上の複数のURLからhtmlデータをGET

Question

いくつかの URL の入力を受け取る Python スクリプトがあります。私のスクリプトは、これらの各 URL をループし、各ページから htmltext を出力します。ウェブサイトはこれを 3 つの個別の GET リクエストと見なし、したがってサイトへの 3 つの「ヒット」と見なしますか、それともソケット接続とページへの 1 つの「ヒット」と見なしますか?

デバッグを確認するのが最初のオプションだと思います。もしそうなら、同じサイトの複数の URL からデータを取得することは可能ですが、サイトはこれをサイトへの 1 つの「ヒット」としてのみ見ることができますか? キープアライブ機能を利用して urllib3 でこれを実現できますか?

私のスクリプトは以下の通りです：

for u in url:
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    req = urllib2.Request(u)
    req.add_header('User-Agent','Mozilla/5.0')
    print urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1)).open(req)
    resp = opener.open(req)
    htmltext = resp.read()

score 2 · Accepted Answer

Would the website see this as 3 seperate GET requests and therefore 3 "hits" to the site or would it see the socket connection and see it as 1 "hit" to the page?

Yes, even if you reuse socket connections, it is still 3 distinct requests (over one socket). The server's access log will show 3 requests regardless of how many connections you've used.

The benefit of reusing connections is that creating a new TCP socket and negotiating the handshake with the server is a relatively expensive procedure. It can sometimes take more time to do that than retrieve the HTTP response body itself. By reusing a connection, you can skip that part after the first request.

python - 1回の接続でWebサイト上の複数のURLからhtmlデータをGET

1 に答える 1

Related

Reference