python - multiprocess: キュー内のプロセスを識別します

Question

応答時間がすべて異なる Web ページを取得するマルチプロセスプログラムがあります。結果は、FIFO ルールに従ってプロセスキューに格納されます。キューからの結果をプロセス番号で識別したいと思います。これは私のテストリグであり、これまでに 2 つのキューを使用して達成したものです。その他の方法で？グローバルリストを使用して結果を保存しようとしましたが、2 つのプロセスが同じメモリ空間を共有していないようです。

#!/usr/bin/python3.2

import time
from multiprocessing import Process, Queue

def myWait(processNb, wait, resultQueues):
    startedAt = time.strftime("%H:%M:%S", time.localtime())
    time.sleep(wait)
    endedAt = time.strftime("%H:%M:%S", time.localtime())
    resultQueues[processNb].put('Process %s started at %s wait %s ended at %s' % (processNb, startedAt, wait, endedAt))

# queue initialisation
resultQueues = [Queue(), Queue()]

# process creation arg: (process number, sleep time, queue)
proc =  [
    Process(target=myWait, args=(0, 2, resultQueues,)),
    Process(target=myWait, args=(1, 1, resultQueues,))
    ]

# starting processes
for p in proc:
    p.start()

for p in proc:
    p.join()

# print results
print(resultQueues[0].get())
print(resultQueues[1].get())

score 2 · Accepted Answer

いいえ、使用している場合、プロセスはアドレス空間をまったくmultiprocessing共有しませんthreading。すべてのプロセスがメモリを共有する場合とは異なります。つまり、プロセス間で共有したいものはすべて、などのプロセス間の明示的な接続を通過する必要がありますQueue。

すべてのプロセスの結果を結合したい場合は、実際には 1 つの結果キューを使用できます。複数のプロセス (および複数のスレッド) が一度にアクセスしても非常に安全です。その後、すべてのワーカーはその結果をそのキューに挿入でき、メインプロセスはそれらが入ってくるとそれらを読み取ることができます。

単一のキューを使用するように修正された上記のコードは次のとおりです。

#!/usr/bin/python3.2

import time
from multiprocessing import Process, Queue

def myWait(processNb, wait, results):
    startedAt = time.strftime("%H:%M:%S", time.localtime())
    time.sleep(wait)
    endedAt = time.strftime("%H:%M:%S", time.localtime())
    results.put('Process %s started at %s wait %s ended at %s' % (processNb, startedAt, wait, endedAt))

# queue initialisation
results = Queue()

# process creation arg: (process number, sleep time, queue)
proc =  [
    Process(target=myWait, args=(0, 2, results,)),
    Process(target=myWait, args=(1, 1, results,))
    ]

# starting processes
for p in proc:
    p.start()

for p in proc:
    p.join()

# print results
print(results.get())
print(results.get())

文字列を読み取らずに各結果のプロセスを特定したい場合は、2 タプルとして簡単に追加できます。これにより、コードは次のように変更されます (変更される部分のみを示しました)。

import time
import multiprocessing
import queue

def myWait(processNb, wait, results):
    startedAt = time.strftime("%H:%M:%S", time.localtime())
    time.sleep(wait)
    endedAt = time.strftime("%H:%M:%S", time.localtime())
    results.put((processNb, 'Process %s started at %s wait %s ended at %s' % (processNb, startedAt, wait, endedAt)))

# queue initialisation
results = multiprocessing.Queue()

# process creation arg: (process number, sleep time, queue)
proc =  [
    multiprocessing.Process(target=myWait, args=(0, 2, results,)),
    multiprocessing.Process(target=myWait, args=(1, 1, results,))
    ]

# starting processes
for p in proc:
    p.start()

for p in proc:
    p.join()

# print results
while True:
    try:
        processNb, message = queue.get_nowait()
        print "Process %d sent: %s" % (processNb, message)
    except queue.Empty:
        break

それは役に立ちますか？

編集:別のレスポンダーが正しく指摘しているように、おそらく文字列よりも構造化されたデータを渡す方が良いでしょうが、説明のために私の例をあなたの例に似たものにしようとしていました. 実際、将来の変更を容易にするために、タプルではなく名前でインデックス付けできるものを使用します (したがって、アイテムを最後に追加するだけに制約されることはありません)。

独自のクラスを使用することも、単純にcollections.namedtupleを使用することもできます (後者は、既にタプルを使用しているコードを後で名前を使用するように拡張し、段階的な移行を可能にする場合に特に便利です)。

(私が知る限り) pickle化できるものは何でもキューに渡すことができることに注意してください。

score 1 · Accepted Answer

mp.Processs にはnameパラメータを指定でき、ターゲット関数myWaitはでアクセスできますmp.current_process().name。したがって、渡す必要はありませんprocessNb。
プロセス間通信を最小限に抑えます。フォーマットされた文字列をキューに渡す代わりに、タプルで変更される文字列の部分を渡すだけです: (name, wait, startedAt, endedAt).

したがって、次のような 1 つのキューで実行できます。

import time
import multiprocessing as mp

def myWait(wait, resultQueue):
    startedAt = time.strftime("%H:%M:%S", time.localtime())
    time.sleep(wait)
    endedAt = time.strftime("%H:%M:%S", time.localtime())
    name = mp.current_process().name
    resultQueue.put(
        (name, wait, startedAt, endedAt))


# queue initialisation
resultQueue = mp.Queue()

# process creation arg: (process number, sleep time, queue)
proc =  [
    mp.Process(target=myWait, name = '0', args=(2, resultQueue,)),
    mp.Process(target=myWait, name = '1', args=(1, resultQueue,))
    ]

# starting processes
for p in proc:
    p.start()

for p in proc:
    p.join()

# print results
for p in proc:
    name, wait, startedAt, endedAt = resultQueue.get()
    print('Process %s started at %s wait %s ended at %s' %
          (name, startedAt, wait, endedAt))

python - multiprocess: キュー内のプロセスを識別します

2 に答える 2

Related

Reference