1

現在、10 ノードのクラスターで問題が発生しており、約 1 日実行した後、3 つのノードがドロップアウトします (常にランダムな 3 つ)。

リアクのバージョン: 2.1.4

10 GB の RAM でそれぞれ 10 個の VM を実行し、Oracle Linux バージョン 7.3 を実行

Java バージョン :

[riak@pp2xria01trd001 riak$] java -version
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

私たちのいつものRiakの男は現在休暇中なので、調べるリソースがあまりありません. どこから探し始めるかについてのヘルプやガイダンスをいただければ幸いです。

クラッシュ ダンプの詳細:

Slogan: Kernel pid terminated (application_controller) ({application_terminated,yokozuna,shutdown})System version: Erlang 
R16B02_basho10 (erts-5.10.3) [source] [64-bit] [smp:2:2] [async-threads:64] [hipe] [kernel-poll:true] [frame-pointer]

理由を詳しく説明するsolr.logにはあまりありません:

2017-04-06 21:04:13,958 [INFO] <qtp1924582348-828>@LogUpdateProcessorFactory.java:198 [marketblueprints_index] webapp=/internal_solr path=/update params={} {} 0 0
2017-04-06 21:04:18,567 [INFO] <qtp1924582348-855>@SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:23,573 [INFO] <qtp1924582348-1161>@SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:28,578 [INFO] <qtp1924582348-865>@SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:33,584 [INFO] <qtp1924582348-848>@SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:38,589 [INFO] <qtp1924582348-641>@SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:54,242 [INFO] <Thread-1>@Monitor.java:41 Yokozuna has exited - shutting down Solr
2017-04-06 21:04:55,219 [INFO] <Thread-2>@Server.java:320 Graceful shutdown SocketConnector@0.0.0.0:8093
2017-04-06 21:04:56,027 [INFO] <Thread-2>@Server.java:329 Graceful shutdown o.e.j.w.WebAppContext{/internal_solr,file:/var/lib/riak/yz_temp/solr-webapp/webapp/},/usr/lib64/
riak/lib/yokozuna-2.1.7-0-g6cf80ad/priv/solr/webapps/solr.war
2017-04-06 21:04:59,288 [INFO] <Thread-2>@CoreContainer.java:314 Shutting down CoreContainer instance=1916575798
2017-04-06 21:04:59,710 [INFO] <Thread-2>@SolrCore.java:1040 [feed_mapping_index]  CLOSING SolrCore org.apache.solr.core.SolrCore@78acc5b

しかし、solr.log でいくつかのマージ プロセスを実行した後、次のメッセージが表示されます (これは、スーパーバイザーが 2 回目の再起動を妨げているため、Riak が停止していると思われます)。

2017-04-06 21:05:13,546 [INFO] <Thread-2>@CachingDirectoryFactory.java:305 Closing directory: /var/lib/riak/yz/endpoint_mappings_index/data
2017-04-06 21:05:13,547 [INFO] <Thread-2>@CachingDirectoryFactory.java:236 looking to close /var/lib/riak/yz/endpoint_mappings_index/data/index [CachedDir<<refCount=0;path=
/var/lib/riak/yz/endpoint_mappings_index/data/index;done=false>>]
2017-04-06 21:05:13,547 [INFO] <Thread-2>@CachingDirectoryFactory.java:305 Closing directory: /var/lib/riak/yz/endpoint_mappings_index/data/index
2017-04-06 21:05:14,657 [INFO] <Thread-2>@ContextHandler.java:832 stopped o.e.j.w.WebAppContext{/internal_solr,file:/var/lib/riak/yz_temp/solr-webapp/webapp/},/usr/lib64/ri
ak/lib/yokozuna-2.1.7-0-g6cf80ad/priv/solr/webapps/solr.war
2017-04-06 21:05:15,298 [WARN] <Thread-2>@QueuedThreadPool.java:145 79 threads could not be stopped

Erlang.log には以下が含まれます:

2017-04-06 21:04:54.193 [error] <0.5934.108> gen_server yz_solr_proc terminated with reason: {timeout,{gen_server,call,[<0.1306.0>,{spawn_connection,{url,"http://localhost:
8093/internal_solr/admin/cores?action=STATUS&wt=json","localhost",8093,undefined,undefined,"/internal_solr/admin/cores?action=STATUS&wt=json",http,hostname},100,1,{[],false
},[]}]}}
2017-04-06 21:04:54.198 [error] <0.5934.108> CRASH REPORT Process yz_solr_proc with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.1306.0>,{spawn_connection
,{url,"http://localhost:8093/internal_solr/admin/cores?action=STATUS&wt=json","localhost",8093,undefined,undefined,"/internal_solr/admin/cores?action=STATUS&wt=json",http,h
ostname},100,1,{[],false},[]}]}} in gen_server:terminate/6 line 744
2017-04-06 21:04:54.201 [error] <0.1150.0> Supervisor yz_solr_sup had child yz_solr_proc started with yz_solr_proc:start_link("/var/lib/riak/yz", "/var/lib/riak/yz_temp", 8
093, 8985) at <0.5934.108> exit with reason {timeout,{gen_server,call,[<0.1306.0>,{spawn_connection,{url,"http://localhost:8093/internal_solr/admin/cores?action=STATUS&wt=j
son","localhost",8093,undefined,undefined,"/internal_solr/admin/cores?action=STATUS&wt=json",http,hostname},100,1,{[],false},[]}]}} in context child_terminated
2017-04-06 21:04:57.422 [info] <0.1102.0>@riak_ensemble_peer:leading:631 {{kv,1141798154164767904846628775559596109106197299200,3,114179815416476790484662877555959610910619
7299200},'riak@pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:04:57.422 [info] <0.1090.0>@riak_ensemble_peer:leading:631 {{kv,685078892498860742907977265335757665463718379520,3,6850788924988607429079772653357576654637183
79520},'riak@pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:04:57.780 [info] <0.1072.0>@riak_ensemble_peer:leading:631 {{kv,0,3,0},'riak@pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:05:01.432 [info] <0.8030.232>@yz_solr_proc:init:119 Starting solr: "/usr/bin/riak/java" ["-Djava.awt.headless=true","-Djetty.home=/usr/lib64/riak/lib/yokozuna
-2.1.7-0-g6cf80ad/priv/solr","-Djetty.temp=/var/lib/riak/yz_temp","-Djetty.port=8093","-Dsolr.solr.home=/var/lib/riak/yz","-DhostContext=/internal_solr","-cp","/usr/lib64/r
iak/lib/yokozuna-2.1.7-0-g6cf80ad/priv/solr/start.jar","-Dlog4j.configuration=file:///etc/riak/solr-log4j.properties","-Dyz.lib.dir=/usr/lib64/riak/lib/yokozuna-2.1.7-0-g6c
f80ad/priv/java_lib","-d64","-Xms4g","-Xmx4g","-XX:+UseStringCache","-XX:+UseCompressedOops","-Dcom.sun.management.jmxremote.port=8985","-Dcom.sun.management.jmxremote.auth
enticate=false","-Dcom.sun.management.jmxremote.ssl=false","org.eclipse.jetty.start.Main"]
2017-04-06 21:05:01.483 [info] <0.1108.0>@riak_ensemble_peer:leading:631 {{kv,1370157784997721485815954530671515330927436759040,3,137015778499772148581595453067151533092743
6759040},'riak@pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:05:02.032 [info] <0.8030.232>@yz_solr_proc:handle_info:184 solr stdout/err: OpenJDK 64-Bit Server VM warning: ignoring option UseSplitVerifier; support was re
moved in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option UseStringCache; support was removed in 8.0

2017-04-06 21:05:04.212 [info] <0.1110.0>@riak_ensemble_peer:leading:631 {{kv,1415829711164312202009819681693899175291684651008,3,0},'riak@pp2xria01trd001.pp2.williamhill.p
lc'}: Leading
2017-04-06 21:05:10.798 [info] <0.1096.0>@riak_ensemble_peer:leading:631 {{kv,913438523331814323877303020447676887284957839360,3,9134385233318143238773030204476768872849578
39360},'riak@pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:05:17.001 [info] <0.8030.232>@yz_solr_proc:handle_info:184 solr stdout/err: Error: Exception thrown by the agent : java.rmi.server.ExportException: Port alrea
dy in use: 8985; nested exception is:
        java.net.BindException: Address already in use (Bind failed)

2017-04-06 21:05:17.964 [error] <0.8030.232> gen_server yz_solr_proc terminated with reason: {"solr OS process exited",1}
2017-04-06 21:05:17.964 [error] <0.8030.232> CRASH REPORT Process yz_solr_proc with 0 neighbours exited with reason: {"solr OS process exited",1} in gen_server:terminate/6 
line 744
2017-04-06 21:05:17.964 [error] <0.1150.0> Supervisor yz_solr_sup had child yz_solr_proc started with yz_solr_proc:start_link("/var/lib/riak/yz", "/var/lib/riak/yz_temp", 8
093, 8985) at <0.8030.232> exit with reason {"solr OS process exited",1} in context child_terminated
2017-04-06 21:05:17.964 [error] <0.1150.0> Supervisor yz_solr_sup had child yz_solr_proc started with yz_solr_proc:start_link("/var/lib/riak/yz", "/var/lib/riak/yz_temp", 8
093, 8985) at <0.8030.232> exit with reason reached_max_restart_intensity in context shutdown
2017-04-06 21:05:17.964 [error] <0.1119.0> Supervisor yz_sup had child yz_solr_sup started with yz_solr_sup:start_link() at <0.1150.0> exit with reason shutdown in context 
child_terminated
2017-04-06 21:05:17.964 [error] <0.1119.0> Supervisor yz_sup had child yz_solr_sup started with yz_solr_sup:start_link() at <0.1150.0> exit with reason reached_max_restart_
intensity in context shutdown
2017-04-06 21:05:23.072 [error] <0.1551.0> Supervisor yz_index_hashtree_sup had child ignored started with yz_index_hashtree:start_link() at undefined exit with reason kill
ed in context shutdown_error
2017-04-06 21:05:24.353 [info] <0.745.0>@yz_app:prep_stop:74 Stopping application yokozuna.
2017-04-06 21:05:27.582 [error] <0.745.0>@yz_app:prep_stop:82 Stopping application yokozuna - exit:{noproc,{gen_server,call,[yz_solrq_drain_mgr,{drain,[]},infinity]}}.
2017-04-06 21:05:27.582 [info] <0.745.0>@yz_app:stop:88 Stopped application yokozuna.
2017-04-06 21:05:27.940 [info] <0.7.0> Application yokozuna exited with reason: shutdown
2017-04-06 21:05:28.165 [info] <0.431.0>@riak_kv_app:prep_stop:228 Stopping application riak_kv - marked service down.
2017-04-06 21:05:28.252 [info] <0.431.0>@riak_kv_app:prep_stop:232 Unregistered pb services
2017-04-06 21:05:28.408 [info] <0.431.0>@riak_kv_app:prep_stop:237 unregistered webmachine routes
2017-04-06 21:05:28.459 [info] <0.431.0>@riak_kv_app:prep_stop:239 all active put FSMs completed
2017-04-06 21:05:29.665 [info] <0.540.0>@riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_hook) host stopping (<0.540.0>)
2017-04-06 21:05:29.665 [info] <0.539.0>@riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_hook) host stopping (<0.539.0>)
2017-04-06 21:05:30.379 [info] <0.532.0>@riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.532.0>)
2017-04-06 21:05:31.116 [info] <0.534.0>@riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.534.0>)
2017-04-06 21:05:31.362 [info] <0.533.0>@riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.533.0>)
2017-04-06 21:05:32.153 [info] <0.536.0>@riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.536.0>)
2017-04-06 21:05:32.245 [info] <0.537.0>@riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.537.0>)
2017-04-06 21:05:32.676 [info] <0.535.0>@riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.535.0>)
2017-04-06 21:05:33.450 [info] <0.431.0>@riak_kv_app:stop:250 Stopped  application riak_kv.
2017-04-06 21:05:41.701 [info] <0.195.0>@riak_core_app:stop:116 Stopped  application riak_core.
2017-04-06 21:05:43.061 [info] <0.93.0> alarm_handler: {clear,system_memory_high_watermark}

追加のオプションが riak.conf に追加されました

search = on
search.solr.jmx_port = 8985
search.solr.jvm_options = -d64 -Xms4g -Xmx4g -XX:+UseStringCache -XX:+UseCompressedOops
search.solr.port = 8093
search.solr.start_timeout = 180s

OOM エラーの兆候も、プロセスが oom_killer によって強制終了されたこともありません

4

0 に答える 0