function - この短い階乗関数をscalaで最適化する方法は? (50000 個の BigInt の作成)

Question

scalaのバージョンを比較してみた

(BigInt(1) to BigInt(50000)).reduce(_ * _)

パイソン版へ

reduce(lambda x,y: x*y, range(1,50000))

そして、scala バージョンは Python バージョンよりも約 10 倍の時間がかかったことがわかりました。

大きな違いは、python が数値ごとに新しい BigInt オブジェクトを作成する代わりに、ネイティブの long 型を使用できることだと思います。しかし、scala に回避策はありますか?

score 16 · Accepted Answer

Scalaコードが50,000個のBigIntオブジェクトを作成するという事実は、ここで大きな違いを生む可能性は低いです。より大きな問題は乗算アルゴリズムです— <a href="http://svn.python.org/view/python/trunk/Objects/longobject.c?view=markup" rel="nofollownoreferrer">Pythonlongはカラツバ乗算を使用しますJava BigInteger（BigIntラップするだけ）はそうではありません。

最も簡単な回避策は、おそらくJScienceのようなより適切な任意精度の数学ライブラリに切り替えることです。

import org.jscience.mathematics.number.LargeInteger

(1 to 50000).foldLeft(LargeInteger.ONE)(_ times _)

これは私のマシンのPythonソリューションよりも高速です。

更新：Luigi Plingiの回答に応えて、Caliperを使用していくつかの簡単なベンチマークコードを作成しました。これにより、私の（クアッドコア）マシンで次の結果が得られます。

              benchmark   ms linear runtime
         BigIntFoldLeft 4774 ==============================
             BigIntFold 4739 =============================
           BigIntReduce 4769 =============================
      BigIntFoldLeftPar 4642 =============================
          BigIntFoldPar  500 ===
        BigIntReducePar  499 ===
   LargeIntegerFoldLeft 3042 ===================
       LargeIntegerFold 3003 ==================
     LargeIntegerReduce 3018 ==================
LargeIntegerFoldLeftPar 3038 ===================
    LargeIntegerFoldPar  246 =
  LargeIntegerReducePar  260 =

reduceと彼の違いはわかりませんfoldが、道徳は明らかです。Scala2.9の並列コレクションを使用できれば、大幅な改善が得られますが、に切り替えると同様にLargeInteger役立ちます。

score 9 · Accepted Answer

私のマシン上のPython：

def func():
  start= time.clock()
  reduce(lambda x,y: x*y, range(1,50000))
  end= time.clock()
  t = (end-start) * 1000
  print t

与える1219 ms

スカラ:

def timed[T](f: => T) = {
  val t0 = System.currentTimeMillis
  val r = f
  val t1 = System.currentTimeMillis
  println("Took: "+(t1 - t0)+" ms")
  r
}

timed { (BigInt(1) to BigInt(50000)).reduce(_ * _) }
4251 ms

timed { (BigInt(1) to BigInt(50000)).fold(BigInt(1))(_ * _) }
4224 ms

timed { (BigInt(1) to BigInt(50000)).par.reduce(_ * _) }
2083 ms

timed { (BigInt(1) to BigInt(50000)).par.fold(BigInt(1))(_ * _) }
689 ms

// using org.jscience.mathematics.number.LargeInteger from Travis's answer
timed { val a = (1 to 50000).foldLeft(LargeInteger.ONE)(_ times _) }
3327 ms

timed { val a = (1 to 50000).map(LargeInteger.valueOf(_)).par.fold(
                                          LargeInteger.ONE)(_ times _) }
361 ms

この 689 ミリ秒と 361 ミリ秒は、数回のウォームアップ実行後のものです。どちらも約 1000 ミリ秒で開始しましたが、ウォームアップの量が異なるようです。並列コレクションは、非並列よりも大幅にウォームアップしているようです。非並列操作は、最初の実行から大幅に減少しませんでした。

(.parつまり、並列コレクションを使用する) はfoldよりも高速化されたようですreduce。コアは 2 つしかありませんが、コアの数が多いほどパフォーマンスが向上するはずです。

したがって、実験的に、この関数を最適化する方法は次のとおりです。

a)foldではなく使用reduce

b) 並列コレクションを使用する

更新: 計算を小さなチャンクに分割すると処理速度が向上するという観察結果に触発されて、私は彼を215 ms自分のマシンで実行させることができました。これは、標準の並列化されたアルゴリズムで 40% の改善です。(BigInt を使用すると、615 ミリ秒かかります。) また、並列コレクションを使用しませんが、何とか 90% の CPU を使用します (BigInt とは異なります)。

  import org.jscience.mathematics.number.LargeInteger

  def fact(n: Int) = {
    def loop(seq: Seq[LargeInteger]): LargeInteger = seq.length match {
      case 0 => throw new IllegalArgumentException
      case 1 => seq.head
      case _ => loop {
        val (a, b) = seq.splitAt(seq.length / 2)
        a.zipAll(b, LargeInteger.ONE, LargeInteger.ONE).map(i => i._1 times i._2)
      } 
    }
    loop((1 to n).map(LargeInteger.valueOf(_)).toIndexedSeq)
  }

score 1 · Accepted Answer

ここでのもう 1 つの秘訣は、両方を試して、どちらreduceLeftがreduceRight最速かを確認することです。あなたの例では、次の実行がはるかに高速になりますreduceRight。

scala> timed { (BigInt(1) to BigInt(50000)).reduceLeft(_ * _) }
Took: 4605 ms

scala> timed { (BigInt(1) to BigInt(50000)).reduceRight(_ * _) }
Took: 2004 ms

foldLeftとの同じ違いfoldRight。ツリーのどちら側から縮小を開始するかが重要だと思います:)

score 0 · Accepted Answer

Scala で階乗を計算する最も効率的な方法は、分割統治戦略を使用することです。

def fact(n: Int): BigInt = rangeProduct(1, n)

private def rangeProduct(n1: Long, n2: Long): BigInt = n2 - n1 match {
  case 0 => BigInt(n1)
  case 1 => BigInt(n1 * n2)
  case 2 => BigInt(n1 * (n1 + 1)) * n2
  case 3 => BigInt(n1 * (n1 + 1)) * ((n2 - 1) * n2)
  case _ => 
    val nm = (n1 + n2) >> 1
    rangeProduct(n1, nm) * rangeProduct(nm + 1, n2)
}

また、速度を上げるには、最新バージョンの JDK と次の JVM オプションを使用します。

-server -XX:+TieredCompilation

以下は、Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz (最大 3.50GHz)、RAM 12Gb DDR3-1333、Windows 7 sp1、Oracle JDK 1.8.0_25-b18 64 ビットの結果です。

(BigInt(1) to BigInt(100000)).product took: 3,806 ms with 26.4 % of CPU usage
(BigInt(1) to BigInt(100000)).reduce(_ * _) took: 3,728 ms with 25.4 % of CPU usage
(BigInt(1) to BigInt(100000)).reduceLeft(_ * _) took: 3,510 ms with 25.1 % of CPU usage
(BigInt(1) to BigInt(100000)).reduceRight(_ * _) took: 4,056 ms with 25.5 % of CPU usage
(BigInt(1) to BigInt(100000)).fold(BigInt(1))(_ * _) took: 3,697 ms with 25.5 % of CPU usage
(BigInt(1) to BigInt(100000)).par.product took: 406 ms with 66.3 % of CPU usage
(BigInt(1) to BigInt(100000)).par.reduce(_ * _) took: 296 ms with 71.1 % of CPU usage
(BigInt(1) to BigInt(100000)).par.reduceLeft(_ * _) took: 3,495 ms with 25.3 % of CPU usage
(BigInt(1) to BigInt(100000)).par.reduceRight(_ * _) took: 3,900 ms with 25.5 % of CPU usage
(BigInt(1) to BigInt(100000)).par.fold(BigInt(1))(_ * _) took: 327 ms with 56.1 % of CPU usage
fact(100000) took: 203 ms with 28.3 % of CPU usage

ところで、20000 を超える数値の階乗計算の効率を改善するために、Schönhage-Strassen アルゴリズムの実装に続いて使用するか、JDK 9 にマージされて Scala が使用できるようになるまで待ちます。

function - この短い階乗関数をscalaで最適化する方法は? (50000 個の BigInt の作成)

4 に答える 4

Related

Reference