java - 偽の共有は、特定のマシンでのみ顕著になりました

Question

「偽共有」によって導入されたパフォーマンスの低下を再現するために、Java で次のテストクラスを作成しました。

基本的に、配列の「サイズ」を 4 からはるかに大きな値 (たとえば 10000) に微調整して、「偽共有現象」をオンまたはオフにすることができます。具体的には、サイズ = 4 の場合、異なるスレッドが同じキャッシュライン内の値を更新する可能性が高くなり、キャッシュミスがより頻繁に発生します。理論的には、テストプログラムは、サイズ = 4 よりもサイズ = 10000 の方がはるかに高速に実行されるはずです。

2 つの異なるマシンで同じテストを複数回実行しました。

マシン A: Intel® Core™ i5-3210M プロセッサー (2 コア、4 スレッド) Windows 7 64 ビット搭載の Lenovo X230 ラップトップ

サイズ = 4 => 5.5 秒

サイズ = 10000 => 5.4 秒

マシン B: Dell OptiPlex 780、Intel® Core™2 Duo プロセッサ E8400 (2 コア) Windows XP 32 ビット搭載

サイズ = 4 => 14.5 秒

サイズ = 10000 => 7.2 秒

後で他のいくつかのマシンでテストを実行しましたが、明らかに偽共有は特定のマシンでのみ顕著になり、そのような違いを生む決定的な要因を理解できませんでした.

誰か親切にこの問題を見て、このテストクラスで導入された偽共有が特定のマシンでのみ顕著になった理由を説明できますか?

public class FalseSharing {

interface Oper {
    int eval(int value);
}

//try tweak the size
static int size = 4;

//try tweak the op
static Oper op = new Oper() {
    @Override
    public int eval(int value) {
        return value + 2;
    }
};

static int[] array = new int[10000 + size];

static final int interval = (size / 4);

public static void main(String args[]) throws InterruptedException {

    long start = System.currentTimeMillis();
    Thread t1 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + 5000);

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000] = op.eval(array[5000]);
                }
            }
        }
    });
    Thread t2 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval] = op.eval(array[5000 + interval]);
                }
            }
        }
    });
    Thread t3 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 2));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 2] = op.eval(array[5000 + interval * 2]);
                }
            }
        }
    });
    Thread t4 = new Thread(new Runnable() {
        @Override
        public void run() {

            System.out.println("Array index:" + (5000 + interval * 3));

            for (int j = 0; j < 30; j++) {
                for (int i = 0; i < 1000000000; i++) {
                    array[5000 + interval * 3] = op.eval(array[5000 + interval * 3]);
                }
            }
        }
    });
    t1.start();
    t2.start();
    t3.start();
    t4.start();
    t1.join();
    t2.join();
    t3.join();
    t4.join();
    System.out.println("Finished!" + (System.currentTimeMillis() - start));
}

}

score 0 · Accepted Answer

偽共有は、64 バイトのブロックでのみ発生します。4 つのスレッドすべてで同じ 64 バイトブロックにアクセスする必要があります。オブジェクトまたは配列を作成し、long[8]4 つのスレッドすべてでこの配列の異なるセルを更新して、独立した配列にアクセスする 4 つのスレッドと比較することをお勧めします。

score 0 · Accepted Answer

あなたのコードはおそらく問題ありません。結果を含むより単純なバージョンを次に示します。

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;


public class TestFalseSharing {
    static long T0 = System.currentTimeMillis();

    static void p(Object msg) {
        System.out.format("%09.3f %-10s %s%n", new Double(0.001*(System.currentTimeMillis()-T0)), Thread.currentThread().getName(), " : "+msg);
    }

    public static void main(String args[]) throws InterruptedException {
        int NT = Runtime.getRuntime().availableProcessors();
        p("Available processors: "+NT);

        int MAXSPAN = 0x1000; //4kB
        final byte[] array = new byte[NT*MAXSPAN];

        for(int i=1; i<=MAXSPAN; i<<=1) {
            testFalseSharing(NT, i, array);
        }
    }

    static void testFalseSharing(final int NT, final int span, final byte[] array) throws InterruptedException {
        final int L1 = 10;
        final int L2 = 10_000_000;

        final CountDownLatch cl = new CountDownLatch(NT*L1);

        long t0 = System.nanoTime();

        for(int i=0 ; i<4; i++) {
            final int startOffset = i*span;

            Thread t = new Thread(new Runnable() {
                @Override
                public void run() {
                    //p("Offset:" + startOffset);
                    for (int j = 0; j < L1; j++) {
                        for (int k = 0; k < L2; k++) {
                            array[startOffset] += 1;
                        }
                        cl.countDown();
                    }
                }
            });
            t.start();

        }

        while(!cl.await(10, TimeUnit.SECONDS)) {
            p(""+cl.getCount()+" left");
        }

        long d = System.nanoTime() - t0;
        p("Duration: " + 1e-9*d + " seconds, Span="+span+" bytes");
    }
}

結果：

00000.000 main        : Available processors: 4
00002.843 main        : Duration: 2.837645384 seconds, Span=1 bytes
00005.689 main        : Duration: 2.8454065760000002 seconds, Span=2 bytes
00008.659 main        : Duration: 2.9697156340000004 seconds, Span=4 bytes
00011.640 main        : Duration: 2.979306959 seconds, Span=8 bytes
00013.780 main        : Duration: 2.140246744 seconds, Span=16 bytes
00015.387 main        : Duration: 1.6061148440000002 seconds, Span=32 bytes
00016.729 main        : Duration: 1.34128957 seconds, Span=64 bytes
00017.944 main        : Duration: 1.215005455 seconds, Span=128 bytes
00019.208 main        : Duration: 1.263007368 seconds, Span=256 bytes
00020.477 main        : Duration: 1.269272208 seconds, Span=512 bytes
00021.719 main        : Duration: 1.241061631 seconds, Span=1024 bytes
00022.975 main        : Duration: 1.256024242 seconds, Span=2048 bytes
00024.171 main        : Duration: 1.195086858 seconds, Span=4096 bytes

答えとして、少なくとも私のラップトップコアi5では、64バイトのキャッシュライン理論を確認しています。

java - 偽の共有は、特定のマシンでのみ顕著になりました

2 に答える 2

Related

Reference