python - ページ全体をダウンロードせずにWebページが存在するかどうかを確認するPythonスクリプト？

Question

Webページの存在をテストするスクリプトを作成しようとしています。ページ全体をダウンロードせずにチェックできると便利です。

これは私の出発点です。複数の例で同じようにhttplibを使用しているのを見てきましたが、チェックするすべてのサイトは単にfalseを返します。

import httplib
from httplib import HTTP
from urlparse import urlparse

def checkUrl(url):
    p = urlparse(url)
    h = HTTP(p[1])
    h.putrequest('HEAD', p[2])
    h.endheaders()
    return h.getreply()[0] == httplib.OK

if __name__=="__main__":
    print checkUrl("http://www.stackoverflow.com") # True
    print checkUrl("http://stackoverflow.com/notarealpage.html") # False

何か案は？

編集

誰かがこれを提案しましたが、彼らの投稿は削除されました.. urllib2はページ全体のダウンロードを回避しますか？

import urllib2

try:
    urllib2.urlopen(some_url)
    return True
except urllib2.URLError:
    return False

score 24 · Accepted Answer

これはどう：

import httplib
from urlparse import urlparse

def checkUrl(url):
    p = urlparse(url)
    conn = httplib.HTTPConnection(p.netloc)
    conn.request('HEAD', p.path)
    resp = conn.getresponse()
    return resp.status < 400

if __name__ == '__main__':
    print checkUrl('http://www.stackoverflow.com') # True
    print checkUrl('http://stackoverflow.com/notarealpage.html') # False

これにより、HTTP HEADリクエストが送信され、応答ステータスコードが400未満の場合はTrueが返されます。

StackOverflowのルートパスが200OKではなくリダイレクト（301）を返すことに注意してください。

score 14 · Accepted Answer

を使用するrequestsと、これは次のように簡単です。

import requests

ret = requests.head('http://www.example.com')
print(ret.status_code)

これは、Webサイトのヘッダーをロードするだけです。これが成功したかどうかをテストするには、結果を確認できますstatus_code。または、接続が成功しなかっraise_for_statusた場合にを発生させるメソッドを使用します。Exception

score 5 · Accepted Answer

これはどう。

import requests

def url_check(url):
    #Description

    """Boolean return - check to see if the site exists.
       This function takes a url as input and then it requests the site 
       head - not the full html and then it checks the response to see if 
       it's less than 400. If it is less than 400 it will return TRUE 
       else it will return False.
    """
    try:
            site_ping = requests.head(url)
            if site_ping.status_code < 400:
                #  To view the return status code, type this   :   **print(site.ping.status_code)** 
                return True
            else:
                return False
    except Exception:
        return False

score -2 · Accepted Answer

あなたが試すことができます

import urllib2

try:
    urllib2.urlopen(url='https://someURL')
except:
    print("page not found")

python - ページ全体をダウンロードせずにWebページが存在するかどうかを確認するPythonスクリプト？

4 に答える 4

Related

Reference