QuarkDB 0.4.3 has been released - notable changes:
Bug fixes
The mechanism meant to provide an early warning for potential MANIFEST corruption was flaky, and would sometimes report a problem where none existed.
New features
Implementation of an optional part of raft, pre-vote. This should prevent partitioned, or otherwise flaky rejoining servers from triggering unnecessary and disruptive elections. A node will first issue an experimental voting round before advancing its term, and start campaigning for earnest only if it has a good chance of winning.
Ability to demote a full node to observer through command raft-demote-to-observer.
Print warnings in the logs whenever write-stalls are triggered.
Improvements
Show resilvering progress in raft-info.
Checkpoint creation through quarkdb-checkpoint will now fail if a different physical filesystem is specified.
RPMs now available for CentOS 8.
Print explicit warnings in the log in case of write stalling.
Reduce default trimming batch size to 200k.
Add in-memory cache for leases to significantly speed up all lease-related operations.
Many thanks to Franck Eyraud (JRC) for the bug report concerning erroneous MANIFEST-related warning.
Full release notes can be found here, packages here, and documentation on the optimal way to upgrade here.
What are the dependencies for quarkdb-0.4.3? i.e. What are the minimum versions of EOS and XROOTD needed? I tried to install it on a server with xrootd 4.11.3 and get:
Unfortunately, since xrootd 5 is now pushed in EPEL quarkdb was build with that version. We’ll take down these RPMS and rebuilt with xrootd 5. We plan to release quarkdb-0.5* which will come with XRootD 5.
We are running quarkdb-0.4.2 and xrootd 4.12.8 and have recently begun receiving messages:
[1624226401828] ERROR: Potential MANIFEST corruption for DB at /quarkdb/checkpoints/checkpoint_2021-06-21/current/state-machine(1783322963 sec)
Do the 0.4.3 release notes regarding the potential manifest corruption apply to the above, or is there a way to determine if there is a valid issue?
Did I understand correctly that quardkdb 0.4.3 packages built against xrootd 5 and removed from the storage-ci.web.cern.ch install repo but not replaced with ones built against xrootd 4? I do not see quark 0.4.3 packages there, and the linuxsoft.cern.ch repo in release notes returns 403.
Cheers,
Pete
After writing above I saw this issue was reported in the 0.4.2 thread and that the behavior we are seeing is the similar with timestamp 1783322963 sec
[root@warp-ornl-cern-05 ~]# quarkdb-validate-checkpoint --path /quarkdb/checkpoints/checkpoint_2021-06-21/
[1624296403520] INFO: Attempting to open ShardDirectory...
[1624296403521] INFO: --- OK!
[1624296403521] INFO: Attempting to open StateMachine...
[1624296403523] INFO: Openning state machine '/quarkdb/checkpoints/checkpoint_2021-06-21/current/state-machine'.
[1624296403599] INFO: --- OK! LAST-APPLIED: 582451415
[1624296403600] INFO: Attempting to open RaftJournal...
[1624296403600] INFO: Opening raft journal '/quarkdb/checkpoints/checkpoint_2021-06-21/current/raft-journal'
[1624296403600] ERROR: Potential MANIFEST corruption for DB at /quarkdb/checkpoints/checkpoint_2021-06-21/current/state-machine(1783322963 sec)
[1624296404064] INFO: --- OK! LOG-SIZE: 582451416, COMMIT-INDEX: 582451415, LOG-START: 532000000
[1624296404065] INFO: Closing state machine '/quarkdb/checkpoints/checkpoint_2021-06-21/current/state-machine'
[1624296404081] INFO: Closing raft journal '/quarkdb/checkpoints/checkpoint_2021-06-21/current/raft-journal'
We will ignore errors with uninitialized var timestamp until move to 0.4.3 and xrootd 5
Yes, indeed we removed the 0.4.3 release since it was built with XRootD5 and we didn’t rebuild it with XRootD4. We don’t plan to do any new releases, since QuarkDB will be released as part of EOS starting with version 5 of EOS.