仕事:
ヘルム チャート経由で dgraph (1 つのゼロと 1 つのアルファ) を kubernetes (Google クラウド) にデプロイしようとしています。
問題: 以前は機能していましたが、現在は機能しません。何が違うのかわかりません。特定のエラーは、以下のログで最もよく説明されています。基本的に、grpc/接続の問題のようです。gcloud クラスター サイズ (ノード数) を 0 に設定した後に最初に表示され、数日後に 4 に戻りましたが、それが原因であるとは信じがたいです。私はこの種の問題にあまり詳しくなく、すべてをセットアップした人はもういません。
以前に gdraph フォーラムに投稿しましたが、dgraph の問題であるとは確信が持てないため、より広いグループに到達するためにここに投稿します。
私が問題を解決しようとしたこと:
ヘルム経由でリリースを削除する
helm delete --purge dgraph
そして再作成
helm install --wait --name dgraph ./charts/dgraph/
また、gcloud cluster size を 0 に設定してから 4 に戻そうとしました。違いはありません。構成を調べましたが、問題ないようです。それを比較して、dgraphリポジトリを含むさまざまな場所で見つけたファイルを作成しました。
ローカルでテストするための別の docker compose ファイルがありますが、これはクラウド展開には関係なく、正常に動作します (この投稿には含まれていません)。
以下に、ログとグラフの仕様を示します。
どんな助けでも本当に感謝しています!
ありがとう!
オーレル
ゼロログ:
I1204 21:27:51.539624 1 run.go:90] Setting up grpc listener at: 0.0.0.0:5080
I1204 21:27:51.539833 1 run.go:90] Setting up http listener at: 0.0.0.0:6080
badger2018/12/04 21:27:51 INFO: Replaying file id: 0 at offset: 1544608
badger2018/12/04 21:27:51 INFO: Replay took: 15.256µs
I1204 21:27:51.888823 1 node.go:152] Setting raft.Config to: &{ID:1 peers:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc00015de10 Applied:0 MaxSizePerMsg:1048576 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1d112c0}
I1204 21:27:51.892352 1 node.go:282] Found hardstate: {Term:27 Vote:1 Commit:6525 XXX_unrecognized:[]}
I1204 21:27:51.897997 1 node.go:291] Group 0 found 6526 entries
I1204 21:27:51.898218 1 raft.go:371] Restarting node for dgraphzero
I1204 21:27:51.898497 1 node.go:84] 1 became follower at term 27
I1204 21:27:51.898744 1 node.go:84] newRaft 1 [peers: [], term: 27, commit: 6525, applied: 0, lastindex: 6525, lastterm: 27]
I1204 21:27:51.902606 1 run.go:229] Running Dgraph Zero...
I1204 21:27:51.919236 1 node.go:174] Setting conf state to nodes:1
I1204 21:27:51.919599 1 raft.go:547] Done applying conf change at 1
E1204 21:27:51.921113 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:7080: connect: connection refused"
I1204 21:27:51.921902 1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:7080
E1204 21:27:51.921301 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:7080: connect: connection refused"
I1204 21:27:51.923212 1 raft.go:272] Removing tablet for attr: [value_date], gid: [1]
E1204 21:27:51.923984 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924075 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924149 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924210 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924265 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924308 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924366 1 raft.go:552] While applying proposal: Invalid address
...
E1204 21:27:52.207869 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:52.207873 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:52.205514 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:9080: connect: connection refused"
I1204 21:27:52.207897 1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:9080
E1204 21:27:52.205566 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:9080: connect: connection refused"
I1204 21:27:52.380095 1 zero.go:375] Got connection request: id:6062 addr:"dgraph-0.dgraph.default.svc.cluster.local:7080"
I1204 21:27:52.380886 1 zero.go:484] Connected: id:6062 addr:"dgraph-0.dgraph.default.svc.cluster.local:7080"
I1204 21:27:52.392898 1 node.go:84] 1 no leader at term 27; dropping index reading msg
I1204 21:27:54.480961 1 node.go:84] 1 is starting a new election at term 27
I1204 21:27:54.481005 1 node.go:84] 1 became pre-candidate at term 27
I1204 21:27:54.481017 1 node.go:84] 1 received MsgPreVoteResp from 1 at term 27
I1204 21:27:54.481102 1 node.go:84] 1 became candidate at term 28
I1204 21:27:54.481112 1 node.go:84] 1 received MsgVoteResp from 1 at term 28
I1204 21:27:54.481218 1 node.go:84] 1 became leader at term 28
I1204 21:27:54.481232 1 node.go:84] raft.node: 1 elected leader 1 at term 28
E1204 21:27:54.483865 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:54.483928 1 zero.go:549] Error while applying proposal in update stream Invalid address
E1204 21:27:54.716975 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:54.717231 1 zero.go:549] Error while applying proposal in update stream Invalid address
W1204 21:27:55.393083 1 node.go:551] [1] Read index context timed out
E1204 21:28:02.208789 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
E1204 21:28:02.209086 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
E1204 21:28:21.892166 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:28:51.893023 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:29:21.892887 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:29:51.892775 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:30:21.892814 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:30:51.892810 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:31:21.892858 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:31:51.892803 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:32:21.892885 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:32:51.892669 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:32:52.417618 1 raft.go:552] While applying proposal: Invalid address
E1204 21:32:52.417962 1 zero.go:549] Error while applying proposal in update stream Invalid address
E1204 21:33:21.892766 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:33:51.892865 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:34:21.892804 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:34:51.892788 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:35:21.892866 1 oracle.go:425] No healthy connection found to leader of group 2
I1204 21:35:51.892321 1 tablet.go:189]
Groups sorted by size: [{gid:2 size:0} {gid:1 size:80673}]
I1204 21:35:51.892359 1 tablet.go:194] size_diff 80673
I1204 21:35:51.892391 1 tablet.go:83] Going to move predicate: [_predicate_], size: [32 kB] from group 1 to 2
E1204 21:35:51.893181 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:35:51.917329 1 tablet.go:231] Got error during move: While calling MovePredicate: rpc error: code = Unknown desc = Group id doesn't match, received request for 1, my gid: 2
E1204 21:35:51.919971 1 tablet.go:70] Error while trying to move predicate _predicate_ from 1 to 2: While calling MovePredicate: rpc error: code = Unknown desc = Group id doesn't match, received request for 1, my gid: 2
E1204 21:36:21.892883 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:36:51.892766 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:37:21.892853 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:37:51.892927 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:37:52.420512 1 raft.go:552] While applying proposal: Invalid address
E1204 21:37:52.420817 1 zero.go:549] Error while applying proposal in update stream Invalid address
E1204 21:38:21.892801 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:38:51.892913 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:39:21.892727 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:39:51.892272 1 oracle.go:425] No healthy connection found to leader of group 2
アルファログ:
++ hostname -f
+ dgraph alpha --my=dgraph-0.dgraph.default.svc.cluster.local:7080 --lru_mb 2048 --zero dgraph-0.dgraph.default.svc.cluster.local:5080
I1204 21:27:52.274206 1 init.go:80]
Dgraph version : v1.0.10
Commit SHA-1 : 8b801bd7
Commit timestamp : 2018-11-05 17:52:33 -0800
Branch : HEAD
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit https://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.
Licensed under Apache 2.0. Copyright 2015-2018 Dgraph Labs, Inc.
I1204 21:27:52.295997 1 server.go:115] Setting Badger table load option: mmap
I1204 21:27:52.296163 1 server.go:127] Setting Badger value log load option: mmap
I1204 21:27:52.296229 1 server.go:155] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
badger2018/12/04 21:27:52 INFO: Replaying file id: 0 at offset: 12977
badger2018/12/04 21:27:52 INFO: Replay took: 10.567µs
I1204 21:27:52.322077 1 server.go:115] Setting Badger table load option: mmap
I1204 21:27:52.322103 1 server.go:127] Setting Badger value log load option: mmap
I1204 21:27:52.322108 1 server.go:169] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
badger2018/12/04 21:27:52 INFO: Replaying file id: 0 at offset: 0
badger2018/12/04 21:27:52 INFO: Replay took: 18.232µs
I1204 21:27:52.376726 1 run.go:338] gRPC server started. Listening on port 9080
I1204 21:27:52.376848 1 run.go:339] HTTP server started. Listening on port 8080
I1204 21:27:52.377184 1 groups.go:92] Current Raft Id: 6062
I1204 21:27:52.377898 1 worker.go:80] Worker listening at address: [::]:7080
I1204 21:27:52.379669 1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:5080
I1204 21:27:52.381207 1 groups.go:119] Connected to group zero. Assigned group: 0
E1204 21:27:52.382305 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
I1204 21:27:52.382655 1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:9080
I1204 21:27:52.390886 1 draft.go:74] Node ID: 6062 with GroupID: 2
I1204 21:27:52.391199 1 node.go:152] Setting raft.Config to: &{ID:6062 peers:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc00008fe10 Applied:22 MaxSizePerMsg:1048576 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1d112c0}
I1204 21:27:52.391360 1 node.go:271] Found Snapshot.Metadata: {ConfState:{Nodes:[6062] XXX_unrecognized:[]} Index:22 Term:11 XXX_unrecognized:[]}
I1204 21:27:52.391445 1 node.go:282] Found hardstate: {Term:12 Vote:6062 Commit:25 XXX_unrecognized:[]}
I1204 21:27:52.391534 1 node.go:291] Group 2 found 4 entries
I1204 21:27:52.391574 1 draft.go:1047] Restarting node for group: 2
I1204 21:27:52.391638 1 node.go:174] Setting conf state to nodes:6062
I1204 21:27:52.391909 1 node.go:84] 17ae became follower at term 12
I1204 21:27:52.392015 1 node.go:84] newRaft 17ae [peers: [17ae], term: 12, commit: 25, applied: 22, lastindex: 25, lastterm: 12]
I1204 21:27:52.392285 1 groups.go:519] Got address of a Zero server: dgraph-0.dgraph.default.svc.cluster.local:5080
I1204 21:27:52.394939 1 draft.go:313] Skipping snapshot at 22, because found one at 22
I1204 21:27:54.712797 1 node.go:84] 17ae is starting a new election at term 12
I1204 21:27:54.713220 1 node.go:84] 17ae became pre-candidate at term 12
I1204 21:27:54.713303 1 node.go:84] 17ae received MsgPreVoteResp from 17ae at term 12
I1204 21:27:54.713474 1 node.go:84] 17ae became candidate at term 13
I1204 21:27:54.713564 1 node.go:84] 17ae received MsgVoteResp from 17ae at term 13
I1204 21:27:54.713821 1 node.go:84] 17ae became leader at term 13
I1204 21:27:54.713954 1 node.go:84] raft.node: 17ae elected leader 17ae at term 13
I1204 21:27:55.392399 1 groups.go:718] Leader idx=6062 of group=2 is connecting to Zero for txn updates
W1204 21:27:55.392803 1 groups.go:723] WARNING: We don't have address of any dgraphzero leader.
I1204 21:27:56.393134 1 groups.go:718] Leader idx=6062 of group=2 is connecting to Zero for txn updates
E1204 21:27:56.397090 1 draft.go:467] Lastcommit 10337 > current 10002. This would cause some commits to be lost.
E1204 21:28:02.383404 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
チャートは次のように指定されます。
statefulset.yml:
# This StatefulSet runs 1 pod with one Zero, one Server
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: dgraph
spec:
serviceName: "dgraph"
replicas: 1
selector:
matchLabels:
app: dgraph
template:
metadata:
labels:
app: dgraph
spec:
{{- if .Values.server.initData.image }}
initContainers:
- name: init-schema
image: {{ .Values.server.initData.image }}
command: ['curl', '-X', 'POST', '-H', 'X-Dgraph-CommitNow:true', '--data-binary', '@graph/schema.txt', '{{ .Values.service.name }}.default.svc.cluster.local/alter']
- name: init-data
image: {{ .Values.server.initData.image }}
command: ['curl', '-X', 'POST', '-H', 'X-Dgraph-CommitNow:true', '--data-binary', '@graph/data.txt', '{{ .Values.service.name }}.default.svc.cluster.local/mutate']
{{- end }}
containers:
- name: zero
image: {{ template "dgraph.image" . }}
imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
ports:
- containerPort: {{ .Values.service.ports.zeroGrpc }}
name: zero-grpc
- containerPort: {{ .Values.service.ports.zeroHttp }}
name: zero-http
volumeMounts:
- name: datadir
mountPath: /dgraph
command:
- bash
- "-c"
- |
set -ex
dgraph zero --my=$(hostname -f):{{ .Values.service.ports.zeroGrpc }}
- name: server
image: {{ template "dgraph.image" . }}
imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
ports:
- containerPort: {{ .Values.service.ports.serverHttp }}
name: server-http
- containerPort: {{ .Values.service.ports.serverGrpc }}
name: server-grpc
volumeMounts:
- name: datadir
mountPath: /dgraph
command:
- bash
- "-c"
- |
set -ex
dgraph alpha --my=$(hostname -f):{{ .Values.server.port }} --lru_mb {{ .Values.server.lruSizeMB }} --zero {{ .Values.server.zeroDns }}:{{ .Values.service.ports.zeroGrpc }}
terminationGracePeriodSeconds: 60
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: {{ .Values.storage.size }}
値.yml:
image:
registry: docker.io
repository: dgraph/dgraph
tag: latest
pullPolicy: Always
service:
name: dgraph-service
ports:
zeroGrpc: 5080
zeroHttp: 6080
serverHttp: 8080
serverGrpc: 9080
server:
# Estimate of the LRU cache size in MB. It’s recommended to set lru_mb to one-third the available RAM.
lruSizeMB: 2048
zeroDns: dgraph-0.dgraph.default.svc.cluster.local
port: 7080
initData:
image: ""
#image: "registry.gitlab.com/organisation/project/backend:latest"
storage:
size: 5Gi