cluster-computing - OpenMP スレッドは、ジョブスケジューラ (例: LSF) によって割り当てられた特定のコアにどのようにマップされますか?

Question

ジョブスケジューラを使用してプログラムを実行すると、スケジューラはジョブに(ユーザーが指定した) n 個のプロセッサコアを割り当てます。OpenMP を使用するプログラムが実行されると、OpenMP は一般にOMP_NUM_THREADSスレッドを使用します。簡単にするために、それぞれが異なるプロセッサコアにマップされているとします。

OpenMP は、スケジューラによってどのコアがプログラム/ジョブに割り当てられたかについて何も知りません (私の知る限り)。また、OS は、OpenMP ではなく、実際に OpenMP スレッドをコアにマップするものです。

私の質問は、OpenMP スレッドが、ジョブスケジューラによってジョブに割り当てられたコアにのみマップされるように、舞台裏で何が起こっているのかということです。

私の質問は一般的なものにしたいのですが、プロセスがジョブスケジューラ間で本当に異なる場合は、LSF 固有の回答が最適です。

score 3 · Accepted Answer

The way it works is very simple - the DRM (distributed resource manager) limits the CPU affinity mask of the process before it is started. The affinity mask tells the OS scheduler on which logical CPUs the process can be scheduled. The default CPU affinity mask simply contains all available logical CPUs. If not instructed otherwise, most OpenMP runtimes obtain that mask when the program is started and they obey it while spawning new threads. Both GNU and Intel OpenMP runtimes examine the affinity mask in order to determine the default number of threads if no OMP_NUM_THREADS is specified. Most OpenMP runtimes also support their own binding mechanisms (also known as per-thread affinity), e.g. the KMP_AFFINITY variable of Intel OpenMP or the GOMP_CPU_AFFINITY variable for GNU OpenMP. Some of these can be instructed to respect the original mask, e.g. KMP_AFFINITY="respect,granularity=core" will make Intel OpenMP bind its threads only to the CPUs enabled in the affinity mask with which the process was started.

Under Linux there are two kinds of affinity masks. One could be considered soft and is set by the sched_setaffinity(2) syscall. This mask is soft because it could be overridden and expanded at any time. But Linux also provides the so-called cpusets (part of the cgroups framework) that function more or less like lightweight containers. One can create a cpuset and assign only certain logical CPUs to it and then that set is AND-ed with whatever mask is requested via sched_setaffinity() in order to obtain the final mask that is actually applied. Therefore cpusets provide a hard mask - it cannot be extended, rather one can only use it or a subset of it (but not a superset). sched_setaffinity() on Linux takes either PIDs or TIDs and therefore could be used to set the affinity of individual threads and that's how OpenMP runtimes implement per-thread affinity. A more portable call is the POSIX pthread_setaffinity_np().

LSF (9.1.1 and later) supports affinity using Linux cpusets. See the documentation here on how to set it up if you are an LSF administrator or how to request certain affinity settings for your jobs if you are user.

Sun (errr... I mean Oracle) Grid Engine has some support for process affinity starting with version 6.2u5 if I recall correctly.

cluster-computing - OpenMP スレッドは、ジョブ スケジューラ (例: LSF) によって割り当てられた特定のコアにどのようにマップされますか?

1 に答える 1

Related

Reference

cluster-computing - OpenMP スレッドは、ジョブスケジューラ (例: LSF) によって割り当てられた特定のコアにどのようにマップされますか?