python - Web サイトのタグをチェックする Python スクリプト

Question

特定の URL を開き、タグが存在するかどうかを確認し、タグが存在しないかどうかを確認するために、Web サイト監視スクリプト (最終的には cron ジョブ) を作成する方法を理解しようとしています。予想されるデータを含めて、ログファイルに書き込むか、電子メールを送信します。

タグは、似たようなものか、比較的似たものになります。

誰にもアイデアはありますか？

score 5 · Accepted Answer

あなたの最善の策は、BeautifulSoupをチェックすることです。そのようなもの：

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen("http://yoursite.com")
soup = BeautifulSoup(page)

# See the docs on how to search through the soup. I'm not sure what
# you're looking for so my example stops here :)

その後、それを電子メールで送信するか、ログに記録するのはかなり標準的な方法です。

score 2 · Accepted Answer

これは、ログに記録してメールを送信するサンプルコード（テストされていない）です。

#!/usr/bin/env python
import logging
import urllib2
import smtplib

#Log config
logging.basicConfig(filename='/tmp/yourscript.log',level=logging.INFO,)

#Open requested url
url = "http://yoursite.com/tags/yourTag"
data = urllib2.urlopen(url)

if check_content(data):
   #Report to log
   logging.info('Content found')
else:
   #Send mail
   send_mail('Content not found')

def check_content(data):
    #Your BeautifulSoup logic here
    return content_found

def send_mail(message_body):
    server = 'localhost'
    recipients = ['you@yourdomain.com']
    sender = 'script@yourdomain.com'
    message = 'From: %s \n Subject: script result \n\n %s' % (sender, message_body)
    session = smtplib.SMTP(server)
    session.sendmail(sender,recipients,message);

私はbeautifulSoupcheck_content()を使用して関数をコーディングします

score 1 · Accepted Answer

次の (テストされていない) コードは、urllib2 を使用してページを取得し、re を使用して検索します。

import urllib2,StringIO

pageString = urllib2.urlopen('**insert url here**').read()
m = re.search(r'**insert regex for the tag you want to find here**',pageString)
if m == None:
    #take action for NOT found here
else:
    #take action for found here

次の (テストされていない) コードは、pycurl と StringIO を使用してページを取得し、再検索してページを検索します。

import pycurl,re,StringIO

b = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.URL, '**insert url here**')
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.perform()
c.close()
m = re.search(r'**insert regex for the tag you want to find here**',b.getvalue())
if m == None:
    #take action for NOT found here
else:
    #take action for found here

python - Web サイトのタグをチェックする Python スクリプト

3 に答える 3

Related

Reference