26

What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism:

  1. Message passing processes, possibly distributed across a cluster, e.g. MPI.
  2. Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al
  3. Implicit shared memory parallelism, using OpenMP.

Deciding on an approach to use is an exercise in trade-offs.

In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets.

In short, what do I need to know about the different parallelization strategies in Python before choosing between them?

4

5 に答える 5

16

一般に、CPUバウンド計算について説明します。これはPythonの得意分野ではありません。歴史的に、どちらもマルチプロセッシングではありません。

主流のPythonインタープリターでのスレッド化は、恐ろしいグローバルロックによって支配されてきました。新しいマルチプロセッシングAPIはそれを回避し、パイプやキューなどを使用してワーカープールを抽象化します。

パフォーマンスクリティカルなコードをCまたはCythonで記述し、接着剤としてPythonを使用できます。

于 2010-06-07T08:35:19.420 に答える
6

The new (2.6) multiprocessing module is the way to go. It uses subprocesses, which gets around the GIL problem. It also abstracts away some of the local/remote issues, so the choice of running your code locally or spread out over a cluster can be made later. The documentation I've linked above is a fair bit to chew on, but should provide a good basis to get started.

于 2010-06-07T08:26:05.643 に答える
5

Ray is an elegant (and fast) library for doing this.

The most basic strategy for parallelizing Python functions is to declare a function with the @ray.remote decorator. Then it can be invoked asynchronously.

import ray
import time

# Start the Ray processes (e.g., a scheduler and shared-memory object store).
ray.init(num_cpus=8)

@ray.remote
def f():
    time.sleep(1)

# This should take one second assuming you have at least 4 cores.
ray.get([f.remote() for _ in range(4)])

You can also parallelize stateful computation using actors, again by using the @ray.remote decorator.

# This assumes you already ran 'import ray' and 'ray.init()'.

import time

@ray.remote
class Counter(object):
    def __init__(self):
        self.x = 0

    def inc(self):
        self.x += 1

    def get_counter(self):
        return self.x

# Create two actors which will operate in parallel.
counter1 = Counter.remote()
counter2 = Counter.remote()

@ray.remote
def update_counters(counter1, counter2):
    for _ in range(1000):
        time.sleep(0.25)
        counter1.inc.remote()
        counter2.inc.remote()

# Start three tasks that update the counters in the background also in parallel.
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)

# Check the counter values.
for _ in range(5):
    counter1_val = ray.get(counter1.get_counter.remote())
    counter2_val = ray.get(counter2.get_counter.remote())
    print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))
    time.sleep(1)

It has a number of advantages over the multiprocessing module:

Ray is a framework I've been helping develop.

于 2019-01-19T23:11:13.440 に答える
2

Depending on how much data you need to process and how many CPUs/machines you intend to use, it is in some cases better to write a part of it in C (or Java/C# if you want to use jython/IronPython)

The speedup you can get from that might do more for your performance than running things in parallel on 8 CPUs.

于 2010-06-07T09:35:59.497 に答える
1

これを行うためのパッケージはたくさんありますが、他の人が言っているように、特にクラス「Pool」を使用したマルチプロセッシングが最も適切です。

同様の結果は、並列pythonでも取得できます。さらに、クラスターで動作するように設計されています。

とにかく、私はマルチプロセッシングで行くと言うでしょう。

于 2010-06-07T09:11:07.390 に答える