CERN Accelerating science

Procedure to convert namespace to QuarkDB

Hello,

I am training myself to the namespace conversion to QuarkDB on a test cluster.
I have read Conversion of in-memory namespace to QuarkDB namespace (http://eos-docs.web.cern.ch/eos-docs/quickstart/ns_quarkdb.html#conversion-of-in-memory-namespace-to-quarkdb-namespace) and Crystal’s post: QDB namespace conversion process.

I would like to make sure I understand the process, so I am going to state here how I understood it and people can correct:

  1. stop EOS and perform an off-line compactification of the namespace
  2. create a temporary QuarkDB database (in /var/lib/quarkdb/convert, for example) on the EOS manager with one node
  3. configure xrootd-quarkdb to use this database
  4. start the quarkdb service (systemctl start xrootd@quarkdb)
  5. run eos-ns-convert with the compacted namespace files (files and directories)
  6. stop the quarkdb service
  7. create the final production clustered QuarkDB database (in /var/lib/quarkdb/production, for example) on all participating nodes.
  8. on each of these nodes, delete the new raft-journal directory (/var/lib/quarkdb/production/current/raft-journal ?) and copy the one from /var/lib/quarkdb/convert on the manager.
  9. on each node in the QuarkDB cluster check that the configuration is redis.mode raft
  10. start the quarkdb service on all nodes and check convergence of the cluster (how ?)
  11. start EOS manager and check that the new namespace is functional (to be described)

How correct is this ?

Thank you

JM

Hello,

The above looks correct, apart from step 8: You need to copy the state-machine from the bulkload instance, into every node of the new clustered QuarkDB instance (replacing the existing one) - not the raft journal. (in fact, the raft-journal directory should not exist in bulkload instances)

I understand this step can be confusing and error-prone… I’ll make sure to add some more comprehensive documentation in https://quarkdb.web.cern.ch/quarkdb/docs/master/ soon.

Regarding step 10: To check convergence, view the QuarkDB logs to make sure an election has occurred.

Note: The above procedure is only necessary if your namespace is very large (hundreds of millions of files), as bulkload significantly speeds things up. If not, you could simply start a cluster QuarkDB in raft mode, and run the conversion tool against that, without any need for bulkload.

Thank you Georgios,

We currently have ~16M files:
> [root@naneosmgr01(EOSMASTER) ~]#eos -b ns
> # ------------------------------------------------------------------------------------
> # Namespace Statistics
> # ------------------------------------------------------------------------------------
> ALL Files 16680297 [booted] (534s)
> ALL Directories 54544
> # ------------------------------------------------------------------------------------

JM

Hi,

I’m not sure how much detail you’re writing into your process, but if you haven’t already, it might be worth noting down what changes are needed on the MGM side as well!

In the xrd.cf.mgm file, make sure to change mgmofs.nslib to use quarkdb, and if running a cluster, make sure to specify nodes:

 mgmofs.nslib /usr/lib64/libEosNsQuarkdb.so
 mgmofs.qdbcluster host:port host:port host:port (etc)

Also new since i last looked at quarkdb: importing configuration into quarkdb and using the new master/slave setup (described here http://eos-docs.web.cern.ch/eos-docs/configuration/master_quarkdb.html)

In case this is of any interest, here are our namespace stats:

# Namespace Statistics
# ------------------------------------------------------------------------------------
ALL      Files                            200701974 [booted] (572s)
ALL      Directories                      40256837
ALL      Total boot time                  784 s

In bulkload mode, it took approximately one hour to complete the namespace conversion process.

Hi, thank you Crystal,

Yes I was aware of the need to change the MGM config in order to use the new namespace.

Now the question will be if I really need to use the bulkload mode. Our namespace is much smaller than yours:
ALL Files 16683933 [booted] (534s)
ALL Directories 54544

16M instead of 200M.

I suppose the benchmark quarkdb-bench could give a clue and could be run before deciding which mode to use… Not sure on how to read the results… On my test cluster (database on standard disk), it gives:

quarkdb-bench --gtest_filter="Benchmark/hset.hset/threads2_events3000000_consensus"
Running main() from bench/main.cc
Note: Google Test filter = Benchmark/hset.hset/threads2_events3000000_consensus
[==========] Running 1 test from 1 test case.
[...]
[       OK ] Benchmark/hset.hset/threads2_events3000000_consensus (87673 ms)
[----------] 1 test from Benchmark/hset (87673 ms total)

[----------] Global test environment tear-down
[1551426742924] INFO: Global environment: clearing connection cache.
[==========] 1 test from 1 test case ran. (87694 ms total)
[  PASSED  ] 1 test.

JM

Hi Jean Michel,

The best way to tell is to perform a test-migration, and measure how long it takes for the migration tool to complete. With 16M files it should not be very long, less than 20 minutes - therefore you don’t need to run in bulkload mode first, you could simply set-up your raft cluster and point the migration tool towards it.

The time to switch from bulkload to the full cluster would probably add more time than bulkload saves, anyway.

Regarding quarkdb-bench: The results you post are quite good, but still it’d be best to test the “real” thing, ie how long it takes for the migration tool to complete. You could test against both bulkload and raft, and decide depending on whether it makes a large difference for you, in terms of the downtime that will be necessary for your EOS instance.

Cheers,
Georgios

By the way, I began writing some code to simplify the bulkload -> raft process, hopefully making it simpler and less error-prone.

Does the following look better than what we have now? (asking especially @crystal, who has gone through the previous procedure already)

  1. Once bulkloading is complete, shut down the QDB process.
  2. Run quarkdb-create --path /some/path --nodes host1:port1,host2:port2,host3:port3 --clusterID cluster-id --steal-state-machine /path/to/bulkloaded/state-machine
  3. Copy /some/path in its entirety to all nodes that are to be part of the new QDB cluster.
  4. Set up the configuration files for all nodes in the new cluster, as usual.

Cheers,
Georgios

I like that change!! We’re definitely using bulkload mode because of the size of our namespace, and while I think I’m reasonably familiar with the migration process now, it always helps to simplify it :slight_smile:

Side question while we’re talking about quarkdb: what environment variables (if any) still need to be set if using the new quarkdb master/slave setup? Referring to things like EOS_MGM_MASTER1/MASTER2, EOS_MGM_ALIAS etc, I’m not sure if those can be removed or if they’re still necessary, as I imagine they probably won’t be used?

@esindril is the expert in this, but I believe the environment variable to use for QDB-based master-slave MGM setup, is EOS_USE_QDB_MASTER=1.

Note that, we still don’t use it in production… All of our production instances on QDB only have a single MGM at the moment.

PS: The extra option has been added to quarkdb-create, and will be available from 0.3.6. :slight_smile: Now need to write some detailed documentation for bulkload, too…

Cheers,
Georgios