python - Scrapy プロジェクト、スケジュールのスクレイピング

翻译自：https://stackoverflow.com/questions/18794323 2013-09-13T20:01:24.403

496 次

だから私はこのページでスケジュールをかき集めようとしています.. http://stats.swehockey.se/ScheduleAndResults/Schedule/3940

..このコードで。

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

class SchemaSpider(BaseSpider):
    name = "schema"
    allowed_domains = ["http://stats.swehockey.se/"]
    start_urls = [
        "http://stats.swehockey.se/ScheduleAndResults/Schedule/3940"
    ]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    rows = hxs.select('//table[@class="tblContent"]/tbody/tr')

    for row in rows:
        date = row.select('/td[1]/div/span/text()').extract()
        teams = row.select('/td[2]/text()').extract()

        print date, teams

しかし、私はそれを機能させることができません。私は何を間違っていますか？私は数時間自分自身を理解しようとしてきましたが、XPath が適切に機能しない理由がわかりません。

python - Scrapy プロジェクト、スケジュールのスクレイピング

1 に答える 1

Related

Reference