I just upgraded to v5.1.19. One of the QDBs is continually crashing, maybe it did not get shut down cleanly.
 Reading configuration file from /etc/xrd.cf.quarkdb  INFO: Openning state machine '/var/quarkdb/node-2/current/state-machine'.  INFO: Opening raft journal '/var/quarkdb/node-2/current/raft-journal' ------ quarkdb protocol plugin initialization completed. ------ xrootd email@example.com:7777 initialization completed.  EVENT: eos-qdb-2.eos-qdb.eos.svc.kermes-dev.local:7777: TIMEOUT after 1446ms, I am not receiving heartbeats. Attempting to start election.  INFO: Starting pre-vote round for term 37  INFO: Pre-vote requests have been sent off, will allow a window of 1000ms to receive replies.  INFO: Pre-vote round unsuccessful for term 37. Contacted 2 nodes, received 2 replies with a tally of 0 positive votes, 0 refused votes, and 2 vetoes.  INFO: Pre-vote round for term 37 resulted in a veto. This means, the next leader of this cluster cannot be me. Stopping election attempts until I receive a heartbeat. 230615 21:54:50 010 XrdProtocol: anon.0:firstname.lastname@example.org terminated handshake not received  INFO: New link from localhost [607404b8-3eaa-483f-9716-8779f2e1bef8]  INFO: Shutting down link from localhost [607404b8-3eaa-483f-9716-8779f2e1bef8]  INFO: New link from eos-qdb-1.eos-qdb.eos.svc.kermes-dev.local [127883fc-c1a7-4b04-98ad-36b21d0190e8]  INFO: Connection with UUID 127883fc-c1a7-4b04-98ad-36b21d0190e8 identifying as 'internal-heartbeat-sender'  INFO: New link from eos-qdb-1.eos-qdb.eos.svc.kermes-dev.local [3d0c2641-4440-422a-97d3-a5cf4c505d11]  INFO: Connection with UUID 3d0c2641-4440-422a-97d3-a5cf4c505d11 identifying as 'internal-replicator'  EVENT: Recognizing leader eos-qdb-1.eos-qdb.eos.svc.kermes-dev.local:7777 for term 36  WARNING: Detected inconsistency for entry #3021686. Contents of my journal: term: 20 -> ['TIMESTAMPED_LEASE_ACQUIRE' 'master_lease' 'eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local:1094' '10000' 'G�C']. Contents of what the leader sent: term: 21 -> ['JOURNAL_LEADERSHIP_MARKER' '21' 'eos-qdb-1.eos-qdb.eos.svc.kermes-dev.local:7777'] terminate called after throwing an instance of 'quarkdb::FatalException' what(): detected inconsistent entries for index 3021686. Leader attempted to overwrite a committed entry with one with different contents. ----- Stack trace (most recent call last) in thread 11: #13 Object ", at 0xffffffffffffffff, in #12 Object ", at 0x7f01c5bdb96c, in #11 Object ", at 0x7f01c5eb2ea4, in #10 Object ", at 0x7f01c6d4e206, in #9 Object ", at 0x7f01c6dbe338, in #8 Object ", at 0x7f01c6dbe216, in #7 Object ", at 0x7f01c6dbaeac, in #6 Object ", at 0x7f01c15543a3, in #5 Object ", at 0x7f01c156bb41, in #4 Object ", at 0x7f01c158d14f, in #3 Object ", at 0x7f01c15945df, in #2 Object ", at 0x7f01c15ff984, in #1 Object ", at 0x7f01c15fcd6d, in #0 Object ", at 0x7f01c1558428, in Stack trace (most recent call last) in thread 11: #18 Object ", at 0xffffffffffffffff, in #17 Object ", at 0x7f01c5bdb96c, in #16 Object ", at 0x7f01c5eb2ea4, in #15 Object ", at 0x7f01c6d4e206, in #14 Object ", at 0x7f01c6dbe338, in #13 Object ", at 0x7f01c6dbe216, in #12 Object ", at 0x7f01c6dbaeac, in #11 Object ", at 0x7f01c15543a3, in #10 Object ", at 0x7f01c156bb41, in #9 Object ", at 0x7f01c158d14f, in #8 Object ", at 0x7f01c15945df, in #7 Object ", at 0x7f01c15ff984, in #6 Object ", at 0x7f01c15fcdde, in #5 Object ", at 0x7f01c663dc52, in #4 Object ", at 0x7f01c663da32, in #3 Object ", at 0x7f01c663da05, in #2 Object ", at 0x7f01c663fa94, in #1 Object ", at 0x7f01c5b14a77, in #0 Object ", at 0x7f01c5b13387, in Aborted (Signal sent by tkill() 1 0) command terminated with exit code 137
The other 2 members are up and running so presumably their version of the data can be considered correct. How can I repair the inconsistent data?
I tried to find documentation about how to view and delete keys but all I found was
Anything that I try to get or hgetall doesn’t seem to exist and I can’t find a way to see all keys.
If I had such a command I’m also not sure how I would apply it because the inconsistent member crashes immediately. Can I tell the leader to overwrite the inconsistent data in the corrupted member?