QuarkDB recovery from checkpoint

Hello,

I am trying to understand the procedure of restoring QuarkDB from a checkpoint as per
https://quarkdb.web.cern.ch/quarkdb/docs/master/backup/

We create daily qdb checkpoints on all three qdb nodes and we archive them on tape.

Running quarkdb-recovery on the checkpoint from a particular qdb node, means we restore this qdb node? Is this what the “How to restore” section in the above link shows? Not sure I understand the command’s syntax though: what are the --path and --command flags for?

In an extreme distaster scenario that includes a total EOS namespace loss, what do we need to do to recreate all thre qdb nodes?

Many thanks,

George

Hi George,

I think there might be some misunderstanding to what the QuarkDB cluster is doing and how it works. When you have a cluster with raft enabled all three(or whatever number of) nodes are holding the same information, therefore, I does not make sense to backup each of them, if you have a backup from the current leader that is already good enough.

As highlighted in the documentation link that you pointed to, restoring works by creating an entire new cluster from a checkpoint. Therefore, the quarkdb-recovery command just helps you in doing this and also to update the cluster members which most likely will have different hostnames. It can also happen that you restore a checkpoint on the exact same machines, in this case you just need to make sure you start from a clean setup and you don’t need to update the hostnames of the machines/clusterID in the cluster. To restore the cluster you recover each of the nodes from the same checkpoint and you start the cluster.

Hope this helps,
Elvin

Hi Elvin,

Many thanks for clarifying and apologies for the confusion. OK , we can use the checkpoint created on the leader to re-create the info (SSTs) on all three quarkdb nodes: we just need to make sure to delete the original contents in /var/lib/quarkdb/ and then copy to this dir the info extracted from the checkpoint. Is this correct?

George

Hi George,

Yes, exactly.

Cheers,
Elvin