2

After banging my head several time, I am finally coming here.

Problem : I am trying to download the content of each of the craiglist posting. By content I mean the "posting body" like description of the cell phone. Looking for a new old phone since iPhone is done with all excitement.

The code is an awesome work by Michael Herman.

My Spider Class

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import *
from craig.items import CraiglistSampleItem

class MySpider(CrawlSpider):
    name = "craigs"
    allowed_domains = ["craigslist.org"]
    start_urls = ["http://minneapolis.craigslist.org/moa/"]

    rules = (Rule (SgmlLinkExtractor(allow=("index\d00\.html", ),restrict_xpaths=('//p[@class="nextpage"]',))
    , callback="parse_items", follow= True),
    )

    def parse_items(self,response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select("//span[@class='pl']")
        items = []
        for titles in titles:
            item = CraiglistSampleItem()
            item ["title"] = titles.select("a/text()").extract()
            item ["link"] = titles.select("a/@href").extract()
            items.append(item)
        return items

そして Item クラス

from scrapy.item import Item, Field

class CraiglistSampleItem(Item):
    title = Field()
    link = Field()

コードは多くのリンクを横断するため、各携帯電話の説明を個別のcsvに保存したかったのですが、csvのもう1列でも問題ありません。

どんなリードでも!!!

4

1 に答える 1