python - URL を含む .csv をインポートして処理する (PYTHON)

Question

URLのリストをインポートし、ソースコードでいくつかのことをチェックするスクリプトに取り組んでいます。.csv のインポートと処理について助けが必要です。誰かがここで助けてくれるなら、コードの一部です

from lxml import html
import csv

def main():
with open('urls.csv', 'r') as csvfile:
    urls = [row[0] for row in csv.reader(csvfile)]

for url in urls:

    doc = html.parse(url)
linkziel = 'http://dandydiary.de/de'
if doc.xpath('//a[@href=$url]', url=linkziel):
    for anchor_node in doc.xpath('//a[@href=$url]', url=linkziel):
        if anchor_node.xpath('./ancestor::div[contains(@class, "sidebar")]'):
            print 'Sidebar'
        elif anchor_node.xpath('./parent::div[contains(@class, "widget")]'):
            print 'Sidebar'           
        elif anchor_node.xpath('./ancestor::div[contains(@class, "comment")]'):
            print 'Kommentar'
        elif anchor_node.xpath('./ancestor::div[contains(@id, "comment")]'):
            print 'Kommentar'
        elif anchor_node.xpath('./ancestor::div[contains(@class, "foot")]'):
            print "Footer"
        elif anchor_node.xpath('./ancestor::div[contains(@id, "foot")]'):
            print "Footer" 
        elif anchor_node.xpath('./ancestor::div[contains(@class, "post")]'):
            print "Contextual"         
        else:
            print 'Unidentified Link'          
else:
    print 'Link is Dead'

if __name__ == '__main__':
main()

URLを1つだけ指定する代わりに、実行されるcsvを使用したい（私はPython 2を使用しています）

score 0 · Accepted Answer

0

Python はcsv、リストのインポートに使用できるモジュールを提供します。

于 2013-05-02T07:56:44.607 に答える

score 0 · Accepted Answer

input.csv新しい行ごとに URL を持つファイルがあるとします。

http://de.wikipedia.org
http://spiegel.de
http://www.vickysmodeblog.com/

次に、 csvモジュールを介してそれをリストに読み込み、それを繰り返すことができます。

import csv
from lxml import html


with open('input.csv', 'r') as csvfile:
    urls = [row[0] for row in csv.reader(csvfile)]

for url in urls:
    print url

    doc = html.parse(url)
    linkziel = 'http://dandydiary.de/de'
    if doc.xpath('//a[@href=$url]', url=linkziel):
        for anchor_node in doc.xpath('//a[@href=$url]', url=linkziel):
            if anchor_node.xpath('./ancestor::div[contains(@class, "sidebar")]'):
                print 'Sidebar'
            elif anchor_node.xpath('./parent::div[contains(@class, "widget")]'):
                print 'Sidebar'
            elif anchor_node.xpath('./ancestor::div[contains(@class, "comment")]'):
                print 'Kommentar'
            elif anchor_node.xpath('./ancestor::div[contains(@id, "comment")]'):
                print 'Kommentar'
            elif anchor_node.xpath('./ancestor::div[contains(@class, "foot")]'):
                print "Footer"
            elif anchor_node.xpath('./ancestor::div[contains(@id, "foot")]'):
                print "Footer"
            elif anchor_node.xpath('./ancestor::div[contains(@class, "post")]'):
                print "Contextual"
            else:
                print 'Unidentified Link'
    else:
        print 'Link is Dead'

その出力は次のとおりです。

http://de.wikipedia.org
Link is Dead
http://spiegel.de
Link is Dead
http://www.vickysmodeblog.com/
Contextual

python - URL を含む .csv をインポートして処理する (PYTHON)

2 に答える 2

Related

Reference