python - Pythonでscrapyを使用して製品のURLをスクレイピングできません

翻译自：https://stackoverflow.com/questions/19607747 2013-10-26T14:15:28.180

392 次

Pythonでスクレイピーを使用して、リンク「 http://www.shopclues.com/diwali-mega-mall/hot-electronics-sale-fs/audio-systems-fs.html 」からすべての製品URLを抽出したいと考えています。以下は、これを行うために使用している関数です。

def parse(self, response):
        print("hello");

        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//div[@id="pagination_contents"]')
        items = []
        i=3
    for site in sites:
            item = DmozItem()
            item['link'] = site.select('div[2]/div['+str(i)+']/a/@href').extract()
            i=int(i)+1;
            print i
            items.append(item)
    return items

各製品 div の x-path は次のとおりです。 //div[@id="pagination_contents"]/div[2]/div['+str(i)+']/a/@href

しかし、すべての製品の URL ではなく、1 つのリンクしか取得できません。

python - Pythonでscrapyを使用して製品のURLをスクレイピングできません

1 に答える 1

Related

Reference