0

Cassandra をデータストアとして使用する実験を行っていますが、ヒープ スペースが不足しているためにノードが失敗するという問題が発生しました。ノードあたり 16 GB の RAM を備えた Ubuntu サーバー 13.04 を実行している 9 ノード クラスターで、Cassandra 2.0.1 を使用して Datastax Community Edition を実行しています。データの移行中に、ヒープ領域が不足したため、2 つのノードが予期せずダウンしました。ログ内のスタック トレースは、かなり目立たず、さまざまでした。そのうちの 1 つの例を次に示します。

ERROR [MutationStage:21] 2013-11-01 07:08:39,656 CassandraDaemon.java (line 185) Exception in thread Thread[MutationStage:21,5,main]
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
    at org.apache.cassandra.utils.SlabAllocator$Region.init(SlabAllocator.java:178)
    at org.apache.cassandra.utils.SlabAllocator.getRegion(SlabAllocator.java:101)
    at org.apache.cassandra.utils.SlabAllocator.allocate(SlabAllocator.java:70)
    at org.apache.cassandra.utils.Allocator.clone(Allocator.java:30)
    at org.apache.cassandra.db.ColumnFamilyStore.internOrCopy(ColumnFamilyStore.java:2220)
    at org.apache.cassandra.db.Column.localCopy(Column.java:277)
    at org.apache.cassandra.db.Memtable$1.apply(Memtable.java:107)
    at org.apache.cassandra.db.Memtable$1.apply(Memtable.java:104)
    at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195)
    at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196)
    at org.apache.cassandra.db.Memtable.put(Memtable.java:160)
    at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
    at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
    at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
    at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
    at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
    at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

これより前に、次のような AssertionErrors があります。

ERROR [FlushWriter:6176] 2013-11-01 06:55:48,825 CassandraDaemon.java (line 185) Exception in thread Thread[FlushWriter:6176,5,main]
java.lang.AssertionError
    at org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:198)
    at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:186)
    at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:358)
    at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:317)
    at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
    at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

次のような多数のガベージ コレクション ステータス メッセージも同様です。

INFO [ScheduledTasks:1] 2013-11-01 06:59:14,923 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 5935 ms for 1 collections, 2963961136 used; max is 3902799872
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,924 StatusLogger.java (line 55) Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) ReadStage                         0         3       58646672         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) RequestResponseStage              0         1       22614351         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) ReadRepairStage                   0         0          76371         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) MutationStage                     7       260      709366463         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) ReplicateOnWriteStage             0         0         104455         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) GossipStage                       0         1        3695467         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,953 StatusLogger.java (line 70) AntiEntropyStage                  0         0            404         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,954 StatusLogger.java (line 70) MigrationStage                    0         0           1178         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,954 StatusLogger.java (line 70) MemtablePostFlusher               1        39          43229         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) MemoryMeter                       0         0            668         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) FlushWriter                       0         0          23228         0                82
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) MiscStage                         0         0            196         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) commitlog_archiver                0         0              0         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) InternalResponseStage             0         0            276         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) HintedHandoff                     0         0             13         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 79) CompactionManager                 3        11
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 81) Commitlog                       n/a       261
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 93) MessagingService                n/a       1,0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 103) Cache Type                     Size                 Capacity               KeysToSave
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 105) KeyCache                   41783700                104857600                      all
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,975 StatusLogger.java (line 111) RowCache                          0                        0                      all
...

これがデータの取り込みからわずか 4 時間後に発生したことを考えると、なぜこれが発生したのか、また発生を防ぐために何ができるのかを考えています。前もって感謝します。

4

1 に答える 1