java - Javaでのループのパフォーマンス（ビットシフトありとビットシフトなし、forとwhile）

Question

Javaでループを使って少しテストをしました。Javaでのビットシフトの速度は、通常、デフォルトの整数インクリメントよりも速いと思いました。これが私のサンプルコードです：

final int n = 16;
long n1 = System.nanoTime();
for (int i = 1; i < 1 << n; i <<= 1) {
    // nothing
}
long n2 = System.nanoTime();
for (int i = 0; i < n; i++) {
    // nothing
}
long n3 = System.nanoTime();
System.out.println("with shift = " + (n2 - n1) + " ns");
System.out.println("without shift = " + (n3 - n2) + " ns");

だから私の考えは、n1とn2の間の時間はn2とn3の間よりも短いだろうということでした。しかし、このスニペットを実行するたびに、整数のインクリメントが速くなるようです。上記のコードの出力は次のとおりです。

with shift = 2445 ns
without shift = 1885 ns

with shift = 2374 ns
without shift = 1886 ns

with shift = 2374 ns
without shift = 1607 ns

誰かがこの行動を説明できますか？答えは、JVMがこのコードをコンパイルする方法にありますか、それとも基盤となるアーキテクチャに基づいていますか？

Ubuntu Linux 3.5.0-17-generic i686 GNU/Linux
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Pentium(R) Dual-Core CPU       T4300  @ 2.10GHz
stepping    : 10
microcode   : 0xa07
cpu MHz     : 1200.000
cache size  : 1024 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fdiv_bug    : no
hlt_bug     : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm dtherm
bogomips    : 4189.42
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Pentium(R) Dual-Core CPU       T4300  @ 2.10GHz
stepping    : 10
microcode   : 0xa07
cpu MHz     : 1200.000
cache size  : 1024 KB
physical id : 0
siblings    : 2
core id     : 1
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fdiv_bug    : no
hlt_bug     : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm dtherm
bogomips    : 4189.42
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

==========編集===============

OKなので、より良い測定値を得るためにコードを更新しました。

私のJVM：

java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) Server VM (build 20.12-b01, mixed mode)

新しいコード：

// amount of shifts
final int n = 16;
// recorded times
long n1 = 0, n2 = 0, n3 = 0, n4 = 0, n5 = 0;
// measured times
long withShiftFor = Long.MAX_VALUE;
long withoutShiftFor = Long.MAX_VALUE;
long withShiftWhile = Long.MAX_VALUE;
long withoutShiftWhile = Long.MAX_VALUE;
// instance to operate with
boolean b = true;
// do some loops to measure a better result
for (int x = 0; x < 2000000; x++) {
    // for loop with shift
    n1 = System.nanoTime();
    for (int i = 1; i < 1 << n; i <<= 1) {
        b = !b;
    }
    // for loop wihtout shift
    n2 = System.nanoTime();
    for (int i = 0; i < n; i++) {
        b = !b;
    }
    // while loop with shift
    n3 = System.nanoTime();
    int i = 1;
    while (i < 1 << n) {
        b = !b;
        i <<= 1;
    }
    // while loop without shift
    n4 = System.nanoTime();
    int j = 0;
    while (j < n) {
        b = !b;
        j++;
    }
    n5 = System.nanoTime();
    // take minimal time to save best result
    withShiftFor = Math.min(withShiftFor, n2 - n1);
    withoutShiftFor = Math.min(withoutShiftFor, n3 - n2);
    withShiftWhile = Math.min(withShiftWhile, n4 - n3);
    withoutShiftWhile = Math.min(withoutShiftWhile, n5 - n4);
}
System.out.println("for with shift = " + withShiftFor + " ns");
System.out.println("for without shift = " + withoutShiftFor + " ns");
System.out.println("while with shift = " + withShiftWhile + " ns");
System.out.println("while without shift = " + withoutShiftWhile + " ns");

3回の実行後の新しい出力（各実行には5秒以上かかりました）：

for with shift = 907 ns
for without shift = 838 ns
while with shift = 907 ns
while without shift = 907 ns

for with shift = 907 ns
for without shift = 907 ns
while with shift = 907 ns
while without shift = 907 ns

for with shift = 907 ns
for without shift = 838 ns
while with shift = 907 ns
while without shift = 907 ns

ですから、あなたは正しかったです。数秒と多くの反復の後、ほぼ同じ結果が得られます。しかし、なぜ他のソリューションよりも速くシフトせずにforループがあるのでしょうか。あなたが言及したシフトによる4行に対してインクリメントのための1行にもかかわらず、jvmによる最適化はありますか？インクリメントのあるwhileが他のループと同じくらい速いのはなぜですか？

score 2 · Accepted Answer

誰かがこの振る舞いを説明できますか? 答えは、JVM がこのコードをコンパイルする方法にあるのでしょうか、それとも基盤となるアーキテクチャに基づいているのでしょうか?

短いループを実行すると、コードが解釈されます。したがって、コードを頻繁に実行しない場合、またはコードをウォームアップできない場合は、これをベンチマークして、得られたような奇妙な結果を期待する必要があります。

コンパイルされた/最適化されたコードを比較したい場合は、最初の 10K から 20K のループを無視する必要があります。これは、ループをデフォルトでコンパイルするために 10K の時間を反復する必要があるためです (その後、バックグラウンドでコンパイルされるため、少し時間がかかります)。

いずれにせよ、変動を減らすために少なくとも 2 秒間テストを実行することをお勧めします。

あなたのループは何もしません.JITがそれらを排除することを期待し、システムに応じて40〜1000 nsを追加できる System.nanoTime() を実行するのにかかる時間を計るだけです。

score 1 · Accepted Answer

数値をシフトするには 4 つのバイトコードが必要ですが、インクリメントには 1 つしか必要ありません。Peter Lawrey が言ったように、JIT コンパイラはおそらく後で変更されるでしょう。

java - Javaでのループのパフォーマンス（ビットシフトありとビットシフトなし、forとwhile）

2 に答える 2

Related

Reference