java - 非常に大きなツリーでDFSを実行するための最良の方法は何ですか？

Question

状況は次のとおりです。

アプリケーションの世界は、数十万の州で構成されています。
状態が与えられると、他の3つまたは4つの到達可能な状態のセットを計算できます。単純な再帰は、非常に速く非常に大きくなる状態のツリーを構築できます。
ルート状態からこのツリーの特定の深さまでDFSを実行して、「最小」状態を含むサブツリーを検索する必要があります（ノードの値の計算は質問とは無関係です）。

シングルスレッドを使用してDFSを実行することは機能しますが、非常に低速です。15レベル下をカバーするには数分かかることがあり、この凶悪なパフォーマンスを改善する必要があります。各サブツリーにスレッドを割り当てようとすると、作成されるスレッドが多すぎて、が発生しましたOutOfMemoryError。を使用することThreadPoolExecutorはそれほど良くありませんでした。

私の質問：この大きな木を横断する最も効率的な方法は何ですか？

score 3 · Accepted Answer

ツリーには約3600万のノードがあるため、ツリーのナビゲートが問題になるとは思いません。代わりに、各ノードで行っていることは費用がかかる可能性が高くなります。

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicLong;

public class Main {
    public static final int TOP_LEVELS = 2;

    enum BuySell {}

    private static final AtomicLong called = new AtomicLong();

    public static void main(String... args) throws InterruptedException {
        int maxLevels = 15;
        long start = System.nanoTime();
        method(maxLevels);
        long time = System.nanoTime() - start;
        System.out.printf("Took %.3f second to navigate %,d levels called %,d times%n", time / 1e9, maxLevels, called.longValue());
    }

    public static void method(int maxLevels) throws InterruptedException {
        ExecutorService service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        try {
            int result = method(service, 0, maxLevels - 1, new int[maxLevels]).call();
        } catch (Exception e) {
            e.printStackTrace();
        }
        service.shutdown();
        service.awaitTermination(10, TimeUnit.MINUTES);
    }

    // single threaded process the highest levels of the tree.
    private static Callable<Integer> method(final ExecutorService service, final int level, final int maxLevel, final int[] options) {
        int choices = level % 2 == 0 ? 3 : 4;
        final List<Callable<Integer>> callables = new ArrayList<Callable<Integer>>(choices);
        for (int i = 0; i < choices; i++) {
            options[level] = i;
            Callable<Integer> callable = level < TOP_LEVELS ?
                    method(service, level + 1, maxLevel, options) :
                    method1(service, level + 1, maxLevel, options);
            callables.add(callable);
        }
        return new Callable<Integer>() {
            @Override
            public Integer call() throws Exception {
                Integer min = Integer.MAX_VALUE;
                for (Callable<Integer> result : callables) {
                    Integer num = result.call();
                    if (min > num)
                        min = num;
                }
                return min;
            }
        };
    }

    // at this level, process the branches in separate threads.
    private static Callable<Integer> method1(final ExecutorService service, final int level, final int maxLevel, final int[] options) {
        int choices = level % 2 == 0 ? 3 : 4;
        final List<Future<Integer>> futures = new ArrayList<Future<Integer>>(choices);
        for (int i = 0; i < choices; i++) {
            options[level] = i;
            final int[] optionsCopy = options.clone();
            Future<Integer> future = service.submit(new Callable<Integer>() {
                @Override
                public Integer call() {
                    return method2(level + 1, maxLevel, optionsCopy);
                }
            });
            futures.add(future);
        }
        return new Callable<Integer>() {
            @Override
            public Integer call() throws Exception {
                Integer min = Integer.MAX_VALUE;
                for (Future<Integer> result : futures) {
                    Integer num = result.get();
                    if (min > num)
                        min = num;
                }
                return min;
            }
        };
    }

    // at these levels each task processes in its own thread.
    private static int method2(int level, int maxLevel, int[] options) {
        if (level == maxLevel) {
            return process(options);
        }
        int choices = level % 2 == 0 ? 3 : 4;
        int min = Integer.MAX_VALUE;
        for (int i = 0; i < choices; i++) {
            options[level] = i;
            int n = method2(level + 1, maxLevel, options);
            if (min > n)
                min = n;
        }

        return min;
    }

    private static int process(final int[] options) {
        int min = options[0];
        for (int i : options)
            if (min > i)
                min = i;
        called.incrementAndGet();
        return min;
    }
}

プリント

Took 1.273 second to navigate 15 levels called 35,831,808 times

スレッドの数を制限し、ツリーの最上位レベルには個別のスレッドのみを使用することをお勧めします。コアはいくつありますか？すべてのコアをビジー状態に保つのに十分なスレッドができたら、オーバーヘッドが増えるだけなので、スレッドを追加する必要はありません。

Javaにはスレッドセーフなスタックが組み込まれていますが、より効率的なArrayListを使用します。

score 0 · Accepted Answer

間違いなく反復法を使用する必要があります。最も簡単な方法は、次のような擬似コードを使用したスタックベースのDFSです。

STACK.push(root)
while (STACK.nonempty) 
   current = STACK.pop
   if (current.done) continue
   // ... do something with node ...
   current.done = true
   FOREACH (neighbor n of current) 
       if (! n.done )
           STACK.push(n)

これの時間計算量はO（n + m）です。ここで、n（m）はグラフ内のノード（エッジ）の数を示します。あなたは木を持っているので、これはO（n）であり、n>1.000.000で簡単に動作するはずです...

java - 非常に大きなツリーでDFSを実行するための最良の方法は何ですか？

2 に答える 2

Related

Reference