java - h2oでメモリサイズよりも大きなデータをロードする

Question

h2oのメモリサイズよりも大きなデータをロードして実験しています。

H2oブログの言及：A note on Bigger Data and GC: We do a user-mode swap-to-disk when the Java heap gets too full, i.e., you’re using more Big Data than physical DRAM. We won’t die with a GC death-spiral, but we will degrade to out-of-core speeds. We’ll go as fast as the disk will allow. I’ve personally tested loading a 12Gb dataset into a 2Gb (32bit) JVM; it took about 5 minutes to load the data, and another 5 minutes to run a Logistic Regression.

R接続するコードは次のh2o 3.6.0.8とおりです。

h2o.init(max_mem_size = '60m') # alloting 60mb for h2o, R is running on 8GB RAM machine

与える

java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

.Successfully connected to http://127.0.0.1:54321/ 

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 561 milliseconds 
    H2O cluster version:        3.6.0.8 
    H2O cluster name:           H2O_started_from_R_RILITS-HWLTP_tkn816 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   0.06 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  2 
    H2O cluster healthy:        TRUE 

Note:  As started, H2O is limited to the CRAN default of 2 CPUs.
       Shut down and restart H2O as shown below to use all your CPUs.
           > h2o.shutdown()
           > h2o.init(nthreads = -1)

IP Address: 127.0.0.1 
Port      : 54321 
Session ID: _sid_b2e0af0f0c62cd64a8fcdee65b244d75 
Key Count : 3

169 MB の csv を h2o に読み込もうとしました。

dat.hex <- h2o.importFile('dat.csv')

エラーをスローした、

Error in .h2o.__checkConnectionHealth() : 
  H2O connection has been severed. Cannot connect to instance at http://127.0.0.1:54321/
Failed to connect to 127.0.0.1 port 54321: Connection refused

これは、メモリ不足エラーを示しています。

質問: H2o がそのメモリ容量よりも大きなデータセットをロードすることを約束している場合 (上記のブログの引用にあるように、ディスクへのスワップメカニズム)、これはデータをロードする正しい方法ですか?

score 5 · Accepted Answer

パフォーマンスが非常に悪かったため、ディスクへのスワップはしばらく前にデフォルトで無効にされました。ブリーディングエッジ (最新の安定版ではない) には、それを有効にするためのフラグがあります: 「--cleaner」 (「メモリクリーナー」の場合)。
クラスタのメモリは非常に小さいことに注意してください。 H2O cluster total memory: 0.06 GB これは 60MB です。JVM を起動するのにかろうじて十分であり、ましてや H2O を実行することはできません。ディスクへのスワップを気にせずに、H2O がそこで適切に起動できるとしたら、私は驚かれることでしょう。スワップは、データのみのスワップに限定されます。スワップテストを実行しようとしている場合は、JVM を 1 ギガまたは 2 ギガの RAM に増やしてから、合計がそれ以上になるデータセットをロードします。

崖

java - h2oでメモリサイズよりも大きなデータをロードする

1 に答える 1

Related

Reference