We should use the command ‘quarkdb-checkpoint’ to create a checkpoint/snapshot that can then be synced. That tool does not seem to be part of the rpm package.
Installed Packages
Name : quarkdb
Arch : x86_64
Version : 0.4.2
Release : 1.el7.cern
However, /bin/quarkdb-validate-checkpoint is there. What am I missing?
Crystal is correct, raft-checkpoint will work. quarkdb-checkpoint is a command aliased to raft-checkpoint, they do exactly the same thing. It’s a redis command, not a tool – I’m adding a clarification in the docs
maybe for useful for others, we’re using this as backup now:
we run this on all quarkdb nodes in the cluster, but the script will only execute the backup on the leader.
roughly it does:
create checkpoint
calculate checkpoint size
validate checkpoint - and bail if validation fails
rotate away previous backup run (the target is a snapshotted NFS mount)
rsync current backup
and write some prometheus metrics
for sure it’s not perfect, but we’ll start with that.
comments welcome.
RAFT_STATUS=$(redis-cli -p 9999 raft-info | grep 'STATUS')
BACKUP_PATH=/mnt/meta_backup/eos_ns_metadata
START_TIME=$(date -Iseconds)
echo "##################################################"
echo "now is ${START_TIME}"
echo "raft status: $RAFT_STATUS"
PROM_METRIC_TIME="# HELP eos_backup_metadata_timestamp time of last successful eos NS metadata backup
# TYPE eos_backup_metadata_timestamp counter
eos_backup_metadata_timestamp"
PROM_METRIC_BYTES="# HELP eos_backup_metadata_bytes size of eos NS metadata backup
# TYPE eos_backup_metadata_bytes gauge
eos_backup_metadata_bytes"
if [[ "$RAFT_STATUS" == "STATUS LEADER" ]] ; then
echo "starting backup"
STAMP=$(date +%s)
CHECKPOINT_PATH=/srv/metadata/backup_${STAMP}
echo "writing snapshot..."
redis-cli -p 9999 raft-checkpoint ${CHECKPOINT_PATH}
echo "showing checkpoint size"
du -sch ${CHECKPOINT_PATH}
BACKUP_BYTES=$(du -s --bytes ${CHECKPOINT_PATH} | awk '{print $1;}')
echo "validating checkpoint"
quarkdb-validate-checkpoint --path ${CHECKPOINT_PATH} --eos 2>&1
VALIDATE_STATUS=$?
if [[ "$VALIDATE_STATUS" -ne "0" ]]; then
echo "=== SNAPSHOT VALIDATION FAILED, BAILING ==="
exit 1
fi
echo "rotate old backup"
rm -rf ${BACKUP_PATH}.old
mv ${BACKUP_PATH} ${BACKUP_PATH}.old
echo "rsync snapshot"
rsync -at ${CHECKPOINT_PATH}/ ${BACKUP_PATH}
echo "cleaning up checkpoint"
rm -rf ${CHECKPOINT_PATH}
echo "update prometheus metric"
echo "${PROM_METRIC_TIME} $(date +%s)" > /opt/prometheus_data/eos_backup.prom
echo "${PROM_METRIC_BYTES} ${BACKUP_BYTES}" >> /opt/prometheus_data/eos_backup.prom
FINISH_TIME=$(date -Iseconds)
echo "completion time: ${FINISH_TIME}"
echo "backup complete, done."
exit 0
else
echo "not leader, done."
exit 0
fi
exit 0