python - Scrapy- 画像を抽出する

Question

x-path を使用してサイトhttp://www.jabong.com/Puma-Wirko-Ind-Black-Sneakers-187839.htmlから画像を抽出したい :

item['pimg'] = hxs.select('//*[@id="wrapper"]/div[2]/div[1]/div[3]/div[1]/ul/li[1]/img').extract()

テキスト値を取得しています。画像を保存する方法を知りたいです。助けてください。

score 0 · Accepted Answer

簡単な答え: 画像パイプラインを使用します: http://doc.scrapy.org/en/latest/topics/images.html

ただし、image_urlsフィールドには完全修飾 URL のリストが必要であることに注意してください。したがって、次のようなものを使用する必要があります

from urlparse import urljoin

    # ... this in your callback method

    item['image_urls'] = []

    for img in hxs.select('//img'):  # change the xpath to suit your needs
        # img is a selector object, select() always returns a list,
        # this might raise the exception IndexError in case the img element
        # does not have a src attribute.
        path = img.select('@src').extract()[0]
        item['image_urls'].append(urljoin(response.url, path))

ドキュメントの例に従った場合、フィールドimagesには各画像のメタデータ (チェックサム、パス、URL) が含まれます。

python - Scrapy- 画像を抽出する

2 に答える 2

Related

Reference