0

私の修士論文では、タイムスケールと postgis を組み合わせることで、OSM データに対する PostgreSQL データベースのパフォーマンスが向上することを証明する課題があります。ヨーロッパの OSM データ (1 億行の CSV ファイル) を含むデータセットを用意しました。従来の postgresql データベースでそのデータの COPY を開始すると、取り込み速度は約 200k 行/秒です。タイムスケール ハイパーテーブル内でコピーすると、取り込み速度が 100k 行/秒未満になります。その結果は期待されていません。私の質問は、なぜそれが起こっているのですか? 何か設定する必要がありますか?おそらく問題は、2006 年から 2019 年までの osm タイムスタンプの不均一性です。

従来の postgresql テーブルに保存する場合:

   osm_timestamp    |                        way                         
---------------------+----------------------------------------------------
 2019-08-20 02:22:35 | 0101000020110F0000F0076BFEFFB162C14485197AF1B65341
 2019-08-05 15:46:38 | 0101000020110F00002BFC9A016E864AC17DB392F223375241
 2019-08-05 15:46:38 | 0101000020110F0000142668FD5A804AC14841650D62375241
 2014-04-22 19:36:43 | 0101000020110F0000A265A7382E7F4AC113BDE36F99375241
 2014-04-22 19:36:43 | 0101000020110F0000C91A02369D7E4AC1D7D24B7197375241
 2018-04-21 21:08:35 | 0101000020110F00003FCDEEF0747E4AC151E880038E375241
 2014-04-22 19:36:43 | 0101000020110F0000C3186511957E4AC19620025B92375241
 2017-12-10 17:43:50 | 0101000020110F0000B24BD8C58E7E4AC153B6CA5192375241
 2014-04-22 19:36:43 | 0101000020110F000014D08064937E4AC1C131DECE95375241
 2017-08-25 12:30:33 | 0101000020110F0000249BF33F977E4AC14AA0211597375241
 2014-04-22 19:36:43 | 0101000020110F0000EC629803907E4AC1DAC3FF3098375241
 2018-04-21 21:08:36 | 0101000020110F000043C2E8A5787E4AC18A7F52A18F375241

timescaledb テーブルに保存する場合:

   osm_timestamp    |                        way                         
---------------------+----------------------------------------------------
 2019-08-20 02:22:35 | 0101000020110F0000F0076BFEFFB162C14485197AF1B65341
 2019-08-19 19:25:36 | 0101000020110F0000BA461AE38D7548C159769C60C3C75141
 2019-08-19 19:25:36 | 0101000020110F0000D8062171F57148C1081AC67C7BC65141
 2019-08-19 19:25:36 | 0101000020110F00000A3CD250F37148C13CB433AB7AC65141
 2019-08-19 19:25:36 | 0101000020110F0000E6C794D0F27148C1E4B157257CC65141
 2019-08-19 19:25:36 | 0101000020110F0000EB32A406717048C1D6F39FB772C65141
 2019-08-19 16:32:34 | 0101000020110F000066CAFEFFEE6048C18DD0C86240C15141
 2019-08-19 16:32:34 | 0101000020110F000058C74E3ADA6048C1244D22AC63C15141
 2019-08-19 16:32:34 | 0101000020110F00004ABED3D8C36048C14FEF45345FC15141
 2019-08-19 10:45:35 | 0101000020110F00005FBA75B7DE5E48C1FB21EF296DC15141
 2019-08-19 19:25:36 | 0101000020110F00000DF0FD868B7948C1EEA03CEE28C95141
 2019-08-19 19:25:36 | 0101000020110F000092EF4F0EE87548C1F7598342B4CB5141
 2019-08-19 19:25:36 | 0101000020110F0000B75DC2F2E67548C1C06DA855B4CB5141
 2019-08-20 18:41:46 | 0101000020110F0000E674D391CC5148C168E4DE3147C25141
 2019-08-20 18:02:29 | 0101000020110F0000FCE227F30C5148C1164B566039C25141
 2019-08-20 18:41:46 | 0101000020110F00007FA03258515148C1C88FDDB08AC25141
 2019-08-20 18:41:46 | 0101000020110F000094A2CFC1165148C15EA45CCAAAC25141
 2019-08-20 18:41:46 | 0101000020110F00004720D019315148C17DEEAD09B3C25141

従来の postgresql での保存時のパフォーマンス:

Stipe@Mile:~/go/bin$ ./timescaledb-parallel-copy --connection "host=localhost user=postgres sslmode=disable password=postgresifra54" --db-name timescale2 --table timescale2 --batch-size 10000 --truncate --log-batches --file /home/Stipe/DISKC/europe-point.csv | tee /home/Stipe/DISKC/postgis.txt
[BATCH] took 43.292909ms, batch size 10000, row rate 230984.709297/sec
[BATCH] took 35.496966ms, batch size 10000, row rate 281714.217491/sec
[BATCH] took 37.104837ms, batch size 10000, row rate 269506.641412/sec
[BATCH] took 36.998932ms, batch size 10000, row rate 270278.071810/sec
[BATCH] took 39.105424ms, batch size 10000, row rate 255719.002049/sec
[BATCH] took 38.659405ms, batch size 10000, row rate 258669.268190/sec
[BATCH] took 35.184652ms, batch size 10000, row rate 284214.833218/sec
[BATCH] took 40.266376ms, batch size 10000, row rate 248346.163558/sec
[BATCH] took 36.179696ms, batch size 10000, row rate 276398.121200/sec

タイムスケール ハイパーテーブルに保存する際のパフォーマンス:

Stipe@Mile:~/go/bin$ ./timescaledb-parallel-copy --connection "host=localhost user=postgres sslmode=disable password=postgresifra54" --db-name timescale --table timescale2 --batch-size 10000 --truncate --log-batches --file /home/Stipe/DISKC/europe-point.csv | tee /home/Stipe/DISKC/postgis.txt
[BATCH] took 6.979696947s, batch size 10000, row rate 1432.726962/sec
[BATCH] took 1.439723348s, batch size 10000, row rate 6945.778864/sec
[BATCH] took 1.27673852s, batch size 10000, row rate 7832.457346/sec
[BATCH] took 619.745584ms, batch size 10000, row rate 16135.653497/sec
[BATCH] took 378.107768ms, batch size 10000, row rate 26447.486263/sec
[BATCH] took 350.852359ms, batch size 10000, row rate 28502.017283/sec
[BATCH] took 194.37932ms, batch size 10000, row rate 51445.801951/sec
[BATCH] took 269.47735ms, batch size 10000, row rate 37108.870189/sec
[BATCH] took 206.672165ms, batch size 10000, row rate 48385.809478/sec
[BATCH] took 232.124194ms, batch size 10000, row rate 43080.386528/sec
[BATCH] took 169.58852ms, batch size 10000, row rate 58966.255499/sec
[BATCH] took 350.809657ms, batch size 10000, row rate 28505.486666/sec
[BATCH] took 117.911529ms, batch size 10000, row rate 84809.348881/sec
[BATCH] took 172.228338ms, batch size 10000, row rate 58062.454275/sec
[BATCH] took 121.701297ms, batch size 10000, row rate 82168.392996/sec
[BATCH] took 173.654201ms, batch size 10000, row rate 57585.707356/sec
[BATCH] took 154.958872ms, batch size 10000, row rate 64533.252410/sec
[BATCH] took 111.999767ms, batch size 10000, row rate 89285.900032/sec
[BATCH] took 176.024805ms, batch size 10000, row rate 56810.175134/sec
[BATCH] took 143.048944ms, batch size 10000, row rate 69906.143453/sec
4

1 に答える 1