list - Java 8 ストリームとイテレーターのパフォーマンス

Question

ストリームを使用する場合と使用しない場合で、リストをフィルター処理する 2 つの方法を比較しています。10,000 アイテムのリストでは、ストリームを使用しない方法の方が高速であることがわかります。なぜそうなのかを理解することに興味があります。誰でも結果を説明できますか？

public static int countLongWordsWithoutUsingStreams(
        final List<String> words, final int longWordMinLength) {
    words.removeIf(word -> word.length() <= longWordMinLength);

    return words.size();
}

public static int countLongWordsUsingStreams(final List<String> words, final int longWordMinLength) {
    return (int) words.stream().filter(w -> w.length() > longWordMinLength).count();
}

JMH を使用したマイクロベンチマーク:

@Benchmark
@BenchmarkMode(Throughput)
@OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsWithoutUsingStreams() {
    countLongWordsWithoutUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}

@Benchmark
@BenchmarkMode(Throughput)
@OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsUsingStreams() {
    countLongWordsUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}

public static void main(String[] args) throws RunnerException {
    final Options opts = new OptionsBuilder()
        .include(PracticeQuestionsCh8Benchmark.class.getSimpleName())
        .warmupIterations(5).measurementIterations(5).forks(1).build();

    new Runner(opts).run();
}

Java -jar ターゲット/benchmarks.jar -wi 5 -i 5 -f 1

Benchmark
Mode Cnt Score Error Units
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsUsingStreams thrpt 5 10.219 ± 0.408 ops/ms
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsWithoutUsingStreams thrpt 5 910.785 ± 21.215 ops/ms

編集：（誰かが回答として投稿された更新を削除したため）

public class PracticeQuestionsCh8Benchmark {
    private static final int NUM_WORDS = 10000;
    private static final int LONG_WORD_MIN_LEN = 10;

    private final List<String> words = makeUpWords();

    public List<String> makeUpWords() {
        List<String> words = new ArrayList<>();
        final Random random = new Random();

        for (int i = 0; i < NUM_WORDS; i++) {
            if (random.nextBoolean()) {
                /*
                 * Do this to avoid string interning. c.f.
                 * http://en.wikipedia.org/wiki/String_interning
                 */
                words.add(String.format("%" + LONG_WORD_MIN_LEN + "s", i));
            } else {
                words.add(String.valueOf(i));
            }
        }

        return words;
    }

    @Benchmark
    @BenchmarkMode(AverageTime)
    @OutputTimeUnit(MILLISECONDS)
    public int benchmarkCountLongWordsWithoutUsingStreams() {
        return countLongWordsWithoutUsingStreams(words, LONG_WORD_MIN_LEN);
    }

    @Benchmark
    @BenchmarkMode(AverageTime)
    @OutputTimeUnit(MILLISECONDS)
    public int benchmarkCountLongWordsUsingStreams() {
        return countLongWordsUsingStreams(words, LONG_WORD_MIN_LEN);
    }
}
public static int countLongWordsWithoutUsingStreams(
    final List<String> words, final int longWordMinLength) {
    final Predicate<String> p = s -> s.length() >= longWordMinLength;

    int count = 0;

    for (String aWord : words) {
        if (p.test(aWord)) {
            ++count;
        }
    }

    return count;
}

public static int countLongWordsUsingStreams(final List<String> words,
    final int longWordMinLength) {
    return (int) words.stream()
    .filter(w -> w.length() >= longWordMinLength).count();
}

score 5 · Accepted Answer

ベンチマークが 10000 要素を超える操作に 1ns (編集: 1μs)かかると言うときはいつでも、コードが実際には何もしないことを理解する賢い JVM のケースをおそらく見つけたでしょう。

Collections.nCopies実際には 10000 個の要素のリストを作成しません。1 つの要素と、そこにあると思われる回数のカウントを含む一種の偽のリストを作成します。そのリストも不変であるため、countLongWordsWithoutUsingStreams何かする必要がある場合は例外をスローremoveIfします。

score 2 · Accepted Answer

ベンチマークメソッドから値を返さないため、JMH は計算された値をエスケープする機会がなく、ベンチマークはデッドコードの除去に悩まされます。何もしないのにかかる時間を計算します。詳細なガイダンスについては、JMH ページを参照してください。

そうは言っても、場合によってはストリームが遅くなる可能性があります: Java 8: Streams vs Collections のパフォーマンス

list - Java 8 ストリームとイテレーターのパフォーマンス

2 に答える 2

Related

Reference