python - カスタムログハンドラーで Scrapy.log モジュールを使用する方法は?

Question

私はScrapyプロジェクトに取り組んでおり、これまでのところすべてがうまく機能しています。しかし、Scrapy のロギング構成の可能性には満足していません。現時点では、プロジェクトのを設定LOG_FILE = 'my_spider.log'しています。コマンドラインでsettings.py実行すると、クロールプロセス全体に対して 1 つの大きなログファイルが作成されます。scrapy crawl my_spiderこれは私の目的には適していません。

scrapy.logモジュールと組み合わせて Python のカスタムログハンドラーを使用するにはどうすればよいですか? logging.handlers.RotatingFileHandler特に、 1 つの巨大なファイルを処理する代わりに、ログデータをいくつかの小さなファイルに分割できるように、Python を利用したいと考えています。残念ながら、Scrapy のロギング機能のドキュメントはあまり充実していません。よろしくお願いします！

score 2 · Accepted Answer

次のようにカスタムログファイルを統合できます (ローテータを統合する方法がわかりません)。

スパイダークラスファイルで:

from datetime import datetime
from scrapy import log
from scrapy.spider import BaseSpider

class ExampleSpider(BaseSpider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ["http://www.example.com/"]

    def __init__(self, name=None, **kwargs):
        LOG_FILE = "scrapy_%s_%s.log" % (self.name, datetime.now())
        # remove the current log
        # log.log.removeObserver(log.log.theLogPublisher.observers[0])
        # re-create the default Twisted observer which Scrapy checks
        log.log.defaultObserver = log.log.DefaultObserver()
        # start the default observer so it can be stopped
        log.log.defaultObserver.start()
        # trick Scrapy into thinking logging has not started
        log.started = False
        # start the new log file observer
        log.start(LOG_FILE)
        # continue with the normal spider init
        super(ExampleSpider, self).__init__(name, **kwargs)

    def parse(self, response):
        ...

出力ファイルは次のようになります。

scrapy_example_2012-08-25 12:34:48.823896.log

python - カスタム ログ ハンドラーで Scrapy.log モジュールを使用する方法は?

3 に答える 3

Related

Reference

python - カスタムログハンドラーで Scrapy.log モジュールを使用する方法は?