I have a few questions about a new EOS 4.4.10 setup that I have (I think) converted over to QuarkDB.
This test setup includes four hosts. Initially, two were set up as MGM (master/slave) and two were FST nodes.
First I upgraded to 4.4.10 from 4.2.28. This apparently went fine and the in-memory NS plugin still seems to work.
Then I set about getting QuarkDB set up. I used the two MGM nodes and one of the FST nodes. This went OK too for the most part. I then followed (as best I could) some of the doc in the GitHub repository.
The conversion seems to have worked, but I have some questions.
First, how can I tell for sure the setup is now completely using QuarkDB? The raft-info shows the DB cluster is working correctly, but is it propagating data? How can I know all nodes have the latest EOS data?
Second, there is some kind of issue with the config data.
Here is the ‘node ls’ and ‘fs ls’ from the two MGMs:
MGM1 (master):
EOS_SERVER_VERSION=4.4.10 EOS_SERVER_RELEASE=1 EOS_CLIENT_VERSION=4.4.10 EOS_CLIENT_RELEASE=1 EOS Console [root://localhost] |/eos/uscms/store/user/dszkola/> node ls ┌──────────┬────────────────────────────────┬────────────────┬──────────┬────────────┬──────┬──────────┬────────┬────────┬────────────────┬─────┐ │type │ hostport│ geotag│ status│ status│ txgw│ gw-queued│ gw-ntx│ gw-rate│ heartbeatdelta│ nofs│ └──────────┴────────────────────────────────┴────────────────┴──────────┴────────────┴──────┴──────────┴────────┴────────┴────────────────┴─────┘ nodesview cmseos-itbfst04.fnal.gov:1095 geotagdefault online on off 0 10 120 2 3 nodesview cmseos-itbfst05.fnal.gov:1095 geotagdefault online on off 0 10 120 2 3 EOS Console [root://localhost] |/eos/uscms/store/user/dszkola/> fs ls ┌────────────────────────┬────┬──────┬────────────────────────────────┬────────────────┬────────────────┬────────────┬──────────────┬────────────┬────────┬────────────────┐ │host │port│ id│ path│ schedgroup│ geotag│ boot│ configstatus│ drainstatus│ active│ health│ └────────────────────────┴────┴──────┴────────────────────────────────┴────────────────┴────────────────┴────────────┴──────────────┴────────────┴────────┴────────────────┘ cmseos-itbfst04.fnal.gov 1095 2001 /storage/data1 default.1 geotagdefault booted rw nodrain online N/A cmseos-itbfst04.fnal.gov 1095 2002 /storage/data2 default.2 geotagdefault booted rw nodrain online N/A cmseos-itbfst04.fnal.gov 1095 2003 /storage/data3 default.3 geotagdefault booted rw nodrain online N/A cmseos-itbfst05.fnal.gov 1095 2011 /storage/data1 default.1 geotagdefault booted rw nodrain online N/A cmseos-itbfst05.fnal.gov 1095 2012 /storage/data2 default.2 geotagdefault booted rw nodrain online N/A cmseos-itbfst05.fnal.gov 1095 2013 /storage/data3 default.3 geotagdefault booted rw nodrain online N/A
MGM2 (slave):
EOS_SERVER_VERSION=4.4.10 EOS_SERVER_RELEASE=1 EOS_CLIENT_VERSION=4.4.10 EOS_CLIENT_RELEASE=1 EOS Console [root://localhost] |/eos/uscms/store/user/dszkola/> node ls ┌──────────┬────────────────────────────────┬────────────────┬──────────┬────────────┬──────┬──────────┬────────┬────────┬────────────────┬─────┐ │type │ hostport│ geotag│ status│ status│ txgw│ gw-queued│ gw-ntx│ gw-rate│ heartbeatdelta│ nofs│ └──────────┴────────────────────────────────┴────────────────┴──────────┴────────────┴──────┴──────────┴────────┴────────┴────────────────┴─────┘ nodesview cmseos-itbfst04.fnal.gov:1095 geotagdefault online off 0 10 120 2 0 nodesview cmseos-itbfst05.fnal.gov:1095 geotagdefault online off 0 10 120 0 0 EOS Console [root://localhost] |/eos/uscms/store/user/dszkola/> fs ls EOS Console [root://localhost] |/eos/uscms/store/user/dszkola/>
and the ‘ns’ command from each:
MGM1:
# ------------------------------------------------------------------------------------ # Namespace Statistics # ------------------------------------------------------------------------------------ ALL Files 17 [booted] (0s) ALL Directories 30 ALL Total boot time 1 s # ------------------------------------------------------------------------------------ ALL Compactification status=off waitstart=0 interval=0 ratio-file=0.0:1 ratio-dir=0.0:1 # ------------------------------------------------------------------------------------ ALL Replication mode=master-rw state=master-rw master=cmseos-itbmgm01.fnal.gov configdir=/var/eos/config/cmseos-itbmgm01.fnal.gov/ config=default mgm:cmseos-itbmgm02.fnal.gov=down mq:cmseos-itbmgm02.fnal.gov:1097=ok # ------------------------------------------------------------------------------------ ALL files created since boot 1 ALL container created since boot 0 # ------------------------------------------------------------------------------------ ALL current file id 112 ALL current container id 39 # ------------------------------------------------------------------------------------ ALL eosxd caps 0 ALL eosxd clients 0 # ------------------------------------------------------------------------------------ ALL File cache max num 30000000 ALL File cache occupancy 11 ALL Container cache max num 3000000 ALL Container cache occupancy 23 # ------------------------------------------------------------------------------------ ALL memory virtual 2.24 GB ALL memory resident 134.73 MB ALL memory share 24.04 MB ALL memory growths 269.64 MB ALL threads 241 ALL fds 281 ALL uptime 61224 # ------------------------------------------------------------------------------------
MGM2:
# ------------------------------------------------------------------------------------ # Namespace Statistics # ------------------------------------------------------------------------------------ ALL Files 16 [failed] (1545343450s) ALL Directories 30 ALL Total boot time 1545343449 s # ------------------------------------------------------------------------------------ ALL Compactification status=off waitstart=0 interval=0 ratio-file=0.0:1 ratio-dir=0.0:1 # ------------------------------------------------------------------------------------ ALL Replication mode=slave-ro state=slave-ro master=cmseos-itbmgm01.fnal.gov configdir=/var/eos/config/cmseos-itbmgm01.fnal.gov/ config=default mgm:cmseos-itbmgm01.fnal.gov=ok mgm:mode=master-rw mq:cmseos-itbmgm01.fnal.gov:1097=ok # ------------------------------------------------------------------------------------ ALL files created since boot 1 ALL container created since boot 0 # ------------------------------------------------------------------------------------ ALL current file id 112 ALL current container id 39 # ------------------------------------------------------------------------------------ ALL eosxd caps 0 ALL eosxd clients 0 # ------------------------------------------------------------------------------------ ALL File cache max num 30000000 ALL File cache occupancy 0 ALL Container cache max num 3000000 ALL Container cache occupancy 6 # ------------------------------------------------------------------------------------ ALL memory virtual 2.22 GB ALL memory resident 300.29 MB ALL memory share 22.02 MB ALL memory growths 2.22 GB ALL threads 241 ALL fds 261 ALL uptime 60424 # ------------------------------------------------------------------------------------
So something is not right there.
Third, where is the metadata that existed in the files.md and directory.md now stored? Yes, in the QuarkDB, but what files on disk? I’d like to keep track of its size.
Fourth, how do we do a proper EOS backup now? Before, I backed up the 2 *.md files, the config, and the daily report file. I need to know how to do that same thing in the new environment
Fifth (and last for right now), is compacting the namespace still necessary with the QuarkDB setup?
Thanks,
Dan Szkola
FNAL