QuarkDB backup: missing quarkdb-checkpoint cmd

Dear all,

We’re running an EOS instance with 3 MGMs and ns in quarkdb.
According to quarkdb docs, https://quarkdb.web.cern.ch/quarkdb/docs/master/backup/

We should use the command ‘quarkdb-checkpoint’ to create a checkpoint/snapshot that can then be synced. That tool does not seem to be part of the rpm package.
Installed Packages
Name : quarkdb
Arch : x86_64
Version : 0.4.2
Release : 1.el7.cern

However, /bin/quarkdb-validate-checkpoint is there. What am I missing?

Best,
Erich

Hi, we currently run the command to checkpoint from the redis-cli, eg. redis-cli -p ${QDB_PORT} raft-checkpoint ${BACKUP_PATH}

From memory it needs to also be run on the QDB master node.

1 Like

Thanks, I’ll give this a try!

Crystal is correct, raft-checkpoint will work. quarkdb-checkpoint is a command aliased to raft-checkpoint, they do exactly the same thing. It’s a redis command, not a tool – I’m adding a clarification in the docs :slight_smile:

quarkdb-validate-checkpoint is indeed a tool.

Cheers,
Georgios

Thanks, @crystal and @gbitzes I just ran
redis-cli -p 9999 raft-checkpoint /srv/metadata/first_backup
it works like a charm :slight_smile:
Best,
Erich

maybe for useful for others, we’re using this as backup now:
we run this on all quarkdb nodes in the cluster, but the script will only execute the backup on the leader.

roughly it does:

  • create checkpoint
  • calculate checkpoint size
  • validate checkpoint - and bail if validation fails
  • rotate away previous backup run (the target is a snapshotted NFS mount)
  • rsync current backup
  • and write some prometheus metrics

for sure it’s not perfect, but we’ll start with that.
comments welcome.


RAFT_STATUS=$(redis-cli -p 9999 raft-info | grep 'STATUS')
BACKUP_PATH=/mnt/meta_backup/eos_ns_metadata

START_TIME=$(date -Iseconds)
echo "##################################################"
echo "now is ${START_TIME}"
echo "raft status: $RAFT_STATUS"

PROM_METRIC_TIME="# HELP eos_backup_metadata_timestamp time of last successful eos NS metadata backup
# TYPE eos_backup_metadata_timestamp counter
eos_backup_metadata_timestamp"

PROM_METRIC_BYTES="# HELP eos_backup_metadata_bytes size of eos NS metadata backup
# TYPE eos_backup_metadata_bytes gauge
eos_backup_metadata_bytes"

if [[ "$RAFT_STATUS" == "STATUS LEADER" ]] ; then
  echo "starting backup"
  STAMP=$(date +%s)
  CHECKPOINT_PATH=/srv/metadata/backup_${STAMP}
  echo "writing snapshot..."
  redis-cli -p 9999 raft-checkpoint ${CHECKPOINT_PATH}
  echo "showing checkpoint size"
  du -sch ${CHECKPOINT_PATH}
  BACKUP_BYTES=$(du -s --bytes  ${CHECKPOINT_PATH} | awk '{print $1;}')
  echo "validating checkpoint"
  quarkdb-validate-checkpoint --path ${CHECKPOINT_PATH} --eos 2>&1
  VALIDATE_STATUS=$?
  if [[ "$VALIDATE_STATUS" -ne "0" ]]; then
    echo "=== SNAPSHOT VALIDATION FAILED, BAILING ==="
    exit 1
  fi
  echo "rotate old backup"
  rm -rf ${BACKUP_PATH}.old
  mv ${BACKUP_PATH} ${BACKUP_PATH}.old
  echo "rsync snapshot"
  rsync -at ${CHECKPOINT_PATH}/ ${BACKUP_PATH}
  echo "cleaning up checkpoint"
  rm -rf ${CHECKPOINT_PATH}
  echo "update prometheus metric"
  echo "${PROM_METRIC_TIME} $(date +%s)" > /opt/prometheus_data/eos_backup.prom
  echo "${PROM_METRIC_BYTES} ${BACKUP_BYTES}" >> /opt/prometheus_data/eos_backup.prom
  FINISH_TIME=$(date -Iseconds)
  echo "completion time: ${FINISH_TIME}"
  echo "backup complete, done."
  exit 0
else
  echo "not leader, done."
  exit 0
fi

exit 0
2 Likes