2

I'm trying to answer the following question out of personal interest: What is the fastest way to send 100,000 HTTP requests in Python?

And this is what I have came up so far, but I'm experiencing something very stange.

When installSignalHandlers is True, it just hangs. I can see that the DelayedCall instances are in reactor._newTimedCalls, but processResponse never gets called.

When installSignalHandlers is False, it throws an error and works.

from twisted.internet import reactor
from twisted.web.client import Agent
from threading import Semaphore, Thread
import time

concurrent = 100
s = Semaphore(concurrent)
reactor.suggestThreadPoolSize(concurrent)
t=Thread(
    target=reactor.run,
    kwargs={'installSignalHandlers':True})
t.daemon=True
t.start()


agent = Agent(reactor)


def processResponse(response,url):
    print response.code, url
    s.release()

def processError(response,url):
    print "error", url
    s.release()

def addTask(url):
    req = agent.request('HEAD', url)
    req.addCallback(processResponse, url)
    req.addErrback(processError, url)


for url in open('urllist.txt'):
    addTask(url.strip())    
    s.acquire()
while s._Semaphore__value!=concurrent:
    time.sleep(0.1)     

reactor.stop()

installSignalHandlers が True の場合にスローされるエラーは次のとおりです。

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 396, in fireEvent
    DeferredList(beforeResults).addCallback(self._continueFiring)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 224, in addCallback
    callbackKeywords=kw)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 213, in addCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
--- <exception caught here> ---
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 409, in _continueFiring
    callable(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1165, in _reallyStartRunning
    self._handleSignals()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1105, in _handleSignals
    signal.signal(signal.SIGINT, self.sigInt)
exceptions.ValueError: signal only works in main thread

私は何を間違っており、正しい方法は何ですか? 私はねじれたのが初めてです。

@moshez: ありがとう。それは今動作します:

from twisted.internet import reactor, threads
from urlparse import urlparse
import httplib
import itertools


concurrent = 100
finished=itertools.count(1)
reactor.suggestThreadPoolSize(concurrent)

def getStatus(ourl):
    url = urlparse(ourl)
    conn = httplib.HTTPConnection(url.netloc)   
    conn.request("HEAD", url.path)
    res = conn.getresponse()
    return res.status

def processResponse(response,url):
    print response, url
    processedOne()

def processError(error,url):
    print "error", url#, error
    processedOne()

def processedOne():
    if finished.next()==added:
        reactor.stop()

def addTask(url):
    req = threads.deferToThread(getStatus, url)
    req.addCallback(processResponse, url)
    req.addErrback(processError, url)   

added=0
for url in open('urllist.txt'):
    added+=1
    addTask(url.strip())

try:
    reactor.run()
except KeyboardInterrupt:
    reactor.stop()
4

1 に答える 1

6

メインスレッドからwaaaaayを使用しすぎている「reactorcalls」(たとえば、agent.requestがreactorを呼び出す可能性が高い)。それがあなたの問題であるかどうかはわかりませんが、それでもサポートされていません-非reactorスレッドから行うreactor呼び出しはreactor.callFromThreadだけです。

また、アーキテクチャ全体が奇妙に見えます。メインスレッドでreactorを実行しないのはなぜですか?10,000個のリクエストを含むファイル全体を読み取り、それらを分割することは、一度にすべてを実行する場合でも、reactorから実行するのに問題はありません。

おそらく、スレッドを使用せずに純粋なツイストソリューションを実行できます。

于 2010-04-14T01:27:21.443 に答える