Cassandra をデータストアとして使用する実験を行っていますが、ヒープ スペースが不足しているためにノードが失敗するという問題が発生しました。ノードあたり 16 GB の RAM を備えた Ubuntu サーバー 13.04 を実行している 9 ノード クラスターで、Cassandra 2.0.1 を使用して Datastax Community Edition を実行しています。データの移行中に、ヒープ領域が不足したため、2 つのノードが予期せずダウンしました。ログ内のスタック トレースは、かなり目立たず、さまざまでした。そのうちの 1 つの例を次に示します。
ERROR [MutationStage:21] 2013-11-01 07:08:39,656 CassandraDaemon.java (line 185) Exception in thread Thread[MutationStage:21,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.apache.cassandra.utils.SlabAllocator$Region.init(SlabAllocator.java:178)
at org.apache.cassandra.utils.SlabAllocator.getRegion(SlabAllocator.java:101)
at org.apache.cassandra.utils.SlabAllocator.allocate(SlabAllocator.java:70)
at org.apache.cassandra.utils.Allocator.clone(Allocator.java:30)
at org.apache.cassandra.db.ColumnFamilyStore.internOrCopy(ColumnFamilyStore.java:2220)
at org.apache.cassandra.db.Column.localCopy(Column.java:277)
at org.apache.cassandra.db.Memtable$1.apply(Memtable.java:107)
at org.apache.cassandra.db.Memtable$1.apply(Memtable.java:104)
at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196)
at org.apache.cassandra.db.Memtable.put(Memtable.java:160)
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
これより前に、次のような AssertionErrors があります。
ERROR [FlushWriter:6176] 2013-11-01 06:55:48,825 CassandraDaemon.java (line 185) Exception in thread Thread[FlushWriter:6176,5,main]
java.lang.AssertionError
at org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:198)
at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:186)
at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:358)
at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:317)
at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
次のような多数のガベージ コレクション ステータス メッセージも同様です。
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,923 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 5935 ms for 1 collections, 2963961136 used; max is 3902799872
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,924 StatusLogger.java (line 55) Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) ReadStage 0 3 58646672 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) RequestResponseStage 0 1 22614351 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) ReadRepairStage 0 0 76371 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) MutationStage 7 260 709366463 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) ReplicateOnWriteStage 0 0 104455 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) GossipStage 0 1 3695467 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,953 StatusLogger.java (line 70) AntiEntropyStage 0 0 404 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,954 StatusLogger.java (line 70) MigrationStage 0 0 1178 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,954 StatusLogger.java (line 70) MemtablePostFlusher 1 39 43229 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) MemoryMeter 0 0 668 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) FlushWriter 0 0 23228 0 82
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) MiscStage 0 0 196 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) commitlog_archiver 0 0 0 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) InternalResponseStage 0 0 276 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) HintedHandoff 0 0 13 0 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 79) CompactionManager 3 11
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 81) Commitlog n/a 261
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 93) MessagingService n/a 1,0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 103) Cache Type Size Capacity KeysToSave
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 105) KeyCache 41783700 104857600 all
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,975 StatusLogger.java (line 111) RowCache 0 0 all
...
これがデータの取り込みからわずか 4 時間後に発生したことを考えると、なぜこれが発生したのか、また発生を防ぐために何ができるのかを考えています。前もって感謝します。