cassandra - Cassandra 1.2 の膨大な読み取りレイテンシ

Question

私は、単一のキースペース、レプリケーション係数 2 (元々は 3 でしたが、2 に落とした) および 10 程度の列ファミリを持つ 4 ノードの cassandra 1.2.6 クラスターで作業しています。Oracle 1.7 jvm を実行しています。読み取りと書き込みが混在しており、おそらく読み取りの 2 ～ 3 倍の書き込みが行われます。

少量の負荷でも、非常に大きな読み取りレイテンシーが発生し、かなりの数の読み取りタイムアウトが発生します (datastax Java ドライバーを使用)。以下は、列ファミリーの 1 つに対する nodetool cfstats の出力例です。

    Column Family: user_scores
    SSTable count: 1
    SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
    Space used (live): 7539098
    Space used (total): 7549091
    Number of Keys (estimate): 42112
    Memtable Columns Count: 2267
    Memtable Data Size: 1048576
    Memtable Switch Count: 2
    Read Count: 2101
    **Read Latency: 272334.202 ms.**
    Write Count: 24947
    Write Latency: NaN ms.
    Pending Tasks: 0
    Bloom Filter False Positives: 0
    Bloom Filter False Ratio: 0.00000
    Bloom Filter Space Used: 55376
    Compacted row minimum size: 447
    Compacted row maximum size: 219342
    Compacted row mean size: 1051

ご覧のとおり、レベルベースの圧縮戦略を使用して読み取りレイテンシを改善しようとしましたが、レイテンシが非常に大きいことがわかります。私は少し困惑しています。Cassandra 1.1.6 クラスターは問題なく動作していましたが、1.2 ではこれまでうまくいきませんでした。

クラスターは、4 つの CPU と 7 Gb の RAM を備えた VM で実行されています。データドライブは、4 つのディスクにまたがるストライプ RAID としてセットアップされます。マシンは IO バウンドではないようです。

すべてのデフォルトで、かなりバニラの構成を実行しています。

負荷が小さい場合でも CPU が急上昇するという奇妙な CPU の動作が見られます。圧縮が実行されているのを時々見ますが、それらはうまく機能しているので、それが原因ではないと思います。

次はどこに行こうか思案中です。どんな助けでも大歓迎です！

[rpc_timeout トレースで更新] まだこれで遊んでいます。トレースの例を次に示します。マージ手順に時間がかかりすぎているようです。

 activity                                                                | timestamp    | source        | source_elapsed
-------------------------------------------------------------------------+--------------+---------------+----------------
                                                      execute_cql3_query | 04:57:18,882 | 100.69.176.51 |              0
 Parsing select * from user_scores where user_id='26257166' LIMIT 10000; | 04:57:18,884 | 100.69.176.51 |           1981
                                                      Peparing statement | 04:57:18,885 | 100.69.176.51 |           2997
                         Executing single-partition query on user_scores | 04:57:18,885 | 100.69.176.51 |           3657
                                            Acquiring sstable references | 04:57:18,885 | 100.69.176.51 |           3724
                                             Merging memtable tombstones | 04:57:18,885 | 100.69.176.51 |           3779
                                            Key cache hit for sstable 32 | 04:57:18,886 | 100.69.176.51 |           3910
                             Seeking to partition beginning in data file | 04:57:18,886 | 100.69.176.51 |           3930
                              Merging data from memtables and 1 sstables | 04:57:18,886 | 100.69.176.51 |           4211
                                                        Request complete | 04:57:38,891 | 100.69.176.51 |       20009870

以下の古いトレース:

[新しいトレース] クラスターデータリポジトリを完全に再構築してログに記録された問題に対処した後も、かなり時間がかかりましたが、まだ問題に遭遇しました。状態が悪いときに取得したトレースを次に示します。

トレースセッション: a6dbefc0-ea49-11e2-84bb-ef447a7d9a48

 activity                                                                                        | timestamp    | source         | source_elapsed
-------------------------------------------------------------------------------------------------+--------------+----------------+----------------
                                                                              execute_cql3_query | 16:48:02,755 | 100.69.196.124 |              0
                                                      Parsing select * from user_scores limit 1; | 16:48:02,756 | 100.69.196.124 |           1774
                                                                              Peparing statement | 16:48:02,759 | 100.69.196.124 |           4006
                                                                   Determining replicas to query | 16:48:02,759 | 100.69.196.124 |           4286
                                                             Enqueuing request to /100.69.176.51 | 16:48:02,763 | 100.69.196.124 |           8849
                                                         Sending message to cdb002/100.69.176.51 | 16:48:02,764 | 100.69.196.124 |           9456
                                                           Message received from /100.69.196.124 | 16:48:03,449 |  100.69.176.51 |            160
                                                            Message received from /100.69.176.51 | 16:48:09,646 | 100.69.196.124 |        6891860
                                                         Processing response from /100.69.176.51 | 16:48:09,647 | 100.69.196.124 |        6892426
 Executing seq scan across 1 sstables for [min(-9223372036854775808), min(-9223372036854775808)] | 16:48:10,288 |  100.69.176.51 |        6838754
                                                     Seeking to partition beginning in data file | 16:48:10,289 |  100.69.176.51 |        6839689
                                                              Read 1 live and 0 tombstoned cells | 16:48:10,289 |  100.69.176.51 |        6839927
                                                     Seeking to partition beginning in data file | 16:48:10,289 |  100.69.176.51 |        6839998
                                                              Read 1 live and 0 tombstoned cells | 16:48:10,289 |  100.69.176.51 |        6840082
                                                                    Scanned 1 rows and matched 1 | 16:48:10,289 |  100.69.176.51 |        6840162
                                                           Enqueuing response to /100.69.196.124 | 16:48:10,289 |  100.69.176.51 |        6840229
                                                              Sending message to /100.69.196.124 | 16:48:10,299 |  100.69.176.51 |        6850072
                                                                                Request complete | 16:48:09,648 | 100.69.196.124 |        6893029

[更新] 私の macbook pro で単独の cassandra インスタンスを使用すると、問題なく動作することを追加する必要があります。AKAは私のマシンで動作します... :)

[トレースデータで更新]

ここにいくつかのトレースデータがあります。これは Java ドライバーからのものです。欠点は、成功したクエリしか追跡できないことです。すべてのクエリがタイムアウトし始める前に、合計 67 クエリにします。奇妙なのは、見た目がそれほど悪くないということです。クエリ 68 で応答がなくなり、2 つのサーバーが過熱しています。

2013-07-11 02:15:45 STDIO [INFO] ***************************************
66:Host (queried): cdb003/100.69.198.47
66:Host (tried): cdb003/100.69.198.47
66:Trace id: c95e51c0-e9cf-11e2-b9a9-5b3c0946787b

66:-----------------------------------------------------+--------------+-----------------+--------------
66:            Enqueuing data request to /100.69.176.51 | 02:15:42.045 |  /100.69.198.47 |          200
66:          Enqueuing digest request to /100.69.176.51 | 02:15:42.045 |  /100.69.198.47 |          265
66:                  Sending message to /100.69.196.124 | 02:15:42.045 |  /100.69.198.47 |          570
66:                   Sending message to /100.69.176.51 | 02:15:42.045 |  /100.69.198.47 |          574
66:                Message received from /100.69.176.51 | 02:15:42.107 |  /100.69.198.47 |        62492
66:             Processing response from /100.69.176.51 | 02:15:42.107 |  /100.69.198.47 |        62780
66:                Message received from /100.69.198.47 | 02:15:42.508 | /100.69.196.124 |           31
66:     Executing single-partition query on user_scores | 02:15:42.508 | /100.69.196.124 |          406
66:                        Acquiring sstable references | 02:15:42.508 | /100.69.196.124 |          473
66:                         Merging memtable tombstones | 02:15:42.508 | /100.69.196.124 |          577
66:                        Key cache hit for sstable 11 | 02:15:42.508 | /100.69.196.124 |          807
66:         Seeking to partition beginning in data file | 02:15:42.508 | /100.69.196.124 |          849
66:          Merging data from memtables and 1 sstables | 02:15:42.509 | /100.69.196.124 |         1500
66:                Message received from /100.69.198.47 | 02:15:43.379 |  /100.69.176.51 |           60
66:     Executing single-partition query on user_scores | 02:15:43.379 |  /100.69.176.51 |          399
66:                        Acquiring sstable references | 02:15:43.379 |  /100.69.176.51 |          490
66:                         Merging memtable tombstones | 02:15:43.379 |  /100.69.176.51 |          593
66:                         Key cache hit for sstable 7 | 02:15:43.380 |  /100.69.176.51 |         1098
66:         Seeking to partition beginning in data file | 02:15:43.380 |  /100.69.176.51 |         1141
66:          Merging data from memtables and 1 sstables | 02:15:43.380 |  /100.69.176.51 |         1912
66:                  Read 1 live and 0 tombstoned cells | 02:15:43.438 |  /100.69.176.51 |        59094
66:                Enqueuing response to /100.69.198.47 | 02:15:43.438 |  /100.69.176.51 |        59225
66:                   Sending message to /100.69.198.47 | 02:15:43.438 |  /100.69.176.51 |        59373
66:Started at: 02:15:42.04466:Elapsed time in micros: 63105
2013-07-11 02:15:45 STDIO [INFO] ***************************************
67:Host (queried): cdb004/100.69.184.134
67:Host (tried): cdb004/100.69.184.134
67:Trace id: c9f365d0-e9cf-11e2-a4e5-7f3170333ff5

67:-----------------------------------------------------+--------------+-----------------+--------------
67:               Message received from /100.69.184.134 | 02:15:42.536 |  /100.69.198.47 |           36
67:     Executing single-partition query on user_scores | 02:15:42.536 |  /100.69.198.47 |          273
67:                        Acquiring sstable references | 02:15:42.536 |  /100.69.198.47 |          311
67:                         Merging memtable tombstones | 02:15:42.536 |  /100.69.198.47 |          353
67:                         Key cache hit for sstable 8 | 02:15:42.536 |  /100.69.198.47 |          436
67:         Seeking to partition beginning in data file | 02:15:42.536 |  /100.69.198.47 |          455
67:          Merging data from memtables and 1 sstables | 02:15:42.537 |  /100.69.198.47 |          811
67:                  Read 1 live and 0 tombstoned cells | 02:15:42.550 |  /100.69.198.47 |        14242
67:               Enqueuing response to /100.69.184.134 | 02:15:42.550 |  /100.69.198.47 |        14456
67:                  Sending message to /100.69.184.134 | 02:15:42.551 |  /100.69.198.47 |        14694
67:            Enqueuing data request to /100.69.198.47 | 02:15:43.021 | /100.69.184.134 |          323
67:                   Sending message to /100.69.198.47 | 02:15:43.021 | /100.69.184.134 |          565
67:                Message received from /100.69.198.47 | 02:15:43.038 | /100.69.184.134 |        17029
67:             Processing response from /100.69.198.47 | 02:15:43.038 | /100.69.184.134 |        17230
67:Started at: 02:15:43.021
67:Elapsed time in micros: 17622

cqlsh を使用したトレースは次のとおりです。

Tracing session: d0f845d0-e9cf-11e2-8882-ef447a7d9a48

 activity                                                                | timestamp    | source         | source_elapsed
-------------------------------------------------------------------------+--------------+----------------+----------------
                                                      execute_cql3_query | 19:15:54,833 | 100.69.196.124 |              0
 Parsing select * from user_scores where user_id='39333433' LIMIT 10000; | 19:15:54,833 | 100.69.196.124 |            103
                                                      Peparing statement | 19:15:54,833 | 100.69.196.124 |            455
                         Executing single-partition query on user_scores | 19:15:54,834 | 100.69.196.124 |           1400
                                            Acquiring sstable references | 19:15:54,834 | 100.69.196.124 |           1468
                                             Merging memtable tombstones | 19:15:54,835 | 100.69.196.124 |           1575
                                            Key cache hit for sstable 11 | 19:15:54,835 | 100.69.196.124 |           1780
                             Seeking to partition beginning in data file | 19:15:54,835 | 100.69.196.124 |           1822
                              Merging data from memtables and 1 sstables | 19:15:54,836 | 100.69.196.124 |           2562
                                      Read 1 live and 0 tombstoned cells | 19:15:54,838 | 100.69.196.124 |           4808
                                                        Request complete | 19:15:54,838 | 100.69.196.124 |           5810

score 1 · Accepted Answer

トレースは、多くの時間がネットワーク操作を行っているか、待機していることを示しているようです。ネットワークに問題があるのではないでしょうか?

一部の操作のみが失敗する場合は、おそらくノードの 1 つだけに問題があります。そのノードが必要ない場合は機能しますが、必要な場合はうまくいきません。他のノードのログファイルを調べる価値があるかもしれません。

score 1 · Accepted Answer

1.2 でパフォーマンスの問題が発生していたようです。幸いなことに、1.2 ブランチにパッチが適用されたばかりだったので、ソースからビルドすると問題は解決しました。

詳細な説明については、 https://issues.apache.org/jira/browse/CASSANDRA-5677を参照してください。

皆さんありがとう！

cassandra - Cassandra 1.2 の膨大な読み取りレイテンシ

2 に答える 2

Related

Reference