python - Parallelism in Python

Question

What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism:

Message passing processes, possibly distributed across a cluster, e.g. MPI.
Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al
Implicit shared memory parallelism, using OpenMP.

Deciding on an approach to use is an exercise in trade-offs.

In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets.

In short, what do I need to know about the different parallelization strategies in Python before choosing between them?

score 16 · Accepted Answer

一般に、CPUバウンド計算について説明します。これはPythonの得意分野ではありません。歴史的に、どちらもマルチプロセッシングではありません。

主流のPythonインタープリターでのスレッド化は、恐ろしいグローバルロックによって支配されてきました。新しいマルチプロセッシングAPIはそれを回避し、パイプやキューなどを使用してワーカープールを抽象化します。

パフォーマンスクリティカルなコードをCまたはCythonで記述し、接着剤としてPythonを使用できます。

score 6 · Accepted Answer

The new (2.6) multiprocessing module is the way to go. It uses subprocesses, which gets around the GIL problem. It also abstracts away some of the local/remote issues, so the choice of running your code locally or spread out over a cluster can be made later. The documentation I've linked above is a fair bit to chew on, but should provide a good basis to get started.

score 5 · Accepted Answer

Ray is an elegant (and fast) library for doing this.

The most basic strategy for parallelizing Python functions is to declare a function with the @ray.remote decorator. Then it can be invoked asynchronously.

import ray
import time

# Start the Ray processes (e.g., a scheduler and shared-memory object store).
ray.init(num_cpus=8)

@ray.remote
def f():
    time.sleep(1)

# This should take one second assuming you have at least 4 cores.
ray.get([f.remote() for _ in range(4)])

You can also parallelize stateful computation using actors, again by using the @ray.remote decorator.

# This assumes you already ran 'import ray' and 'ray.init()'.

import time

@ray.remote
class Counter(object):
    def __init__(self):
        self.x = 0

    def inc(self):
        self.x += 1

    def get_counter(self):
        return self.x

# Create two actors which will operate in parallel.
counter1 = Counter.remote()
counter2 = Counter.remote()

@ray.remote
def update_counters(counter1, counter2):
    for _ in range(1000):
        time.sleep(0.25)
        counter1.inc.remote()
        counter2.inc.remote()

# Start three tasks that update the counters in the background also in parallel.
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)

# Check the counter values.
for _ in range(5):
    counter1_val = ray.get(counter1.get_counter.remote())
    counter2_val = ray.get(counter2.get_counter.remote())
    print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))
    time.sleep(1)

It has a number of advantages over the multiprocessing module:

The same code runs on a single multi-core machine as well as a large cluster.
Data is shared efficiently between processes on the same machine using shared memory and efficient serialization.
You can parallelize Python functions (using tasks) and Python classes (using actors).
Error messages are propagated nicely.

Ray is a framework I've been helping develop.

score 2 · Accepted Answer

Depending on how much data you need to process and how many CPUs/machines you intend to use, it is in some cases better to write a part of it in C (or Java/C# if you want to use jython/IronPython)

The speedup you can get from that might do more for your performance than running things in parallel on 8 CPUs.

score 1 · Accepted Answer

これを行うためのパッケージはたくさんありますが、他の人が言っているように、特にクラス「Pool」を使用したマルチプロセッシングが最も適切です。

同様の結果は、並列pythonでも取得できます。さらに、クラスターで動作するように設計されています。

とにかく、私はマルチプロセッシングで行くと言うでしょう。

python - Parallelism in Python

5 に答える 5

Related

Reference