2

環境:

Scrapy 0.16.2 Twisted-12.2.0 Python 2.7 macosx-10.6

ここに私の問題があります:

走ろうとする

scrapy shell http://aaa.17domn.com/bt9/file.php/MERH77V.html

エラー:

[ScrapyHTTPPageGetter,client] Unhandled Error
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/selectreactor.py", line 150, in _doReadOrWrite
        why = getattr(selectable, method)()

      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/tcp.py", line 202, in doRead
        return self._dataReceived(data)

      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/tcp.py", line 208, in _dataReceived
        rval = self.protocol.dataReceived(data)

      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/protocols/basic.py", line 564, in dataReceived
        why = self.lineReceived(line)

      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.2-py2.7.egg/scrapy/core/downloader/webclient.py", line 50, in lineReceived
        return HTTPClient.lineReceived(self, line.rstrip())

      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/web/http.py", line 450, in lineReceived
        self.extractHeader(self._header)

      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/web/http.py", line 406, in extractHeader
        key, val = header.split(':',1)
    exceptions.ValueError: need more than 1 value to unpack

https://groups.google.com/forum/#!msg/scrapy-users/xFKo8ggzPxs/VXDl3CZ4V4cJから解決策を見つけました 。これはねじれが原因であると説明しています。次に、 http://twistedmatrix.com/trac/ticket/2842から /twisted/web/http.py の関数 extractHeader にパッチを当てました。その作品

しかし、まだ待ってください!!!

私は別のウェブを運営しています

scrapy shell http://www1.wkdown.info/fs3/file.php/M994ATR.html

エラー:

Traceback (most recent call last):

  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/defer.py", line 551, in _runCallbacks
    current.result = callback(current.result, *args, **kw)

  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.2-py2.7.egg/scrapy/core/downloader/webclient.py", line 122, in _build_response
    status = int(self.status)

ValueError: invalid literal for int() with base 10: 'html'

応答ヘッダーで何かが起こると思います。Scrapy はそれをうまく処理できません。何か案が?ありがとうございました!

4

0 に答える 0