I’m working on setting Citrine EOS (ver 4.2.26) at CentOS 7. I managed to start successfully master-slave combination. And it worked fine until I have tried to set quota. Setting quota immediately crashes MGM at slave server and it refuses to start. But when I remove quota node (‘quota rmnode’ at master) slave MGM can start again. All details on this problem are presented in my message at Master/Slave Configurationtion topic. Did anybody encountered with such or similar situation?
Probably not. At least I got no any reply on this issue. By the way, later I found that now EOS documentation on Master/Slave configuration refers only to “BERYLL release” (see http://eos-docs.web.cern.ch/eos-docs/configuration/master.html). So one can guess that such mode is not for Citrine. Moreover it can be so that for Citrine only Master/Slave setup with QaurkDB works (see page http://eos-docs.web.cern.ch/eos-docs/configuration/master_quarkdb.html). But I didn’t test it: for a moment we have suspended all work on EOS.
From memory, manually compacting the mdlogs on the slave might fix this issue, but upgrading to 4.3.x should certainly stop that occurring.
Master/slave setup definitely works with Citrine, I guess that documentation could use some updating We don’t run slave MGMs anymore right now, but we were doing so for a while (AARNet Citrine Upgrade Site Report).
QuarkDB isn’t necessary for Citrine either - we’re not using that yet in production, but I do have it running on a test cluster.
Great! Thanks for your clarifications and comments! Hopefully next time my experience with EOS will be more successful
By the way, you wrote that you are not using Master/Slave mode anymore. Do you use several masters instead? Is there any documentation on configuring such mode?
Sure enough, compacting the namespace seems to have alleviated the issue. After restarting EOS services on the slave MGM, it stayed up and running instead of crashing.
We’re in the middle of moving our EOS nodes to SL7 and then we will move to Citrine. I’m concerned about the 4.2.x releases at this point as there seems to be quite a few issues and I don’t think 4.3.x is for production use yet, unless I missed an announcement, so I’m stuck wondering what release we will use.
@yupi We’re currently running just a single master as we’re working on hardware upgrades for the slave mgms - I don’t believe a multi-master mgm setup is possible.
Glad that fixed up the issue @dszkola ! I’m interested in knowing which Citrine version(s) are being run in production at CERN currently, as well - @esindril would you happen to have this info? Thanks!
@crystalConcerning multi-master mode: if it is not possible, why does sample EOS configuration file eos_env.example contain following lines?
# The fully qualified hostname of MGM master1
EOS_MGM_MASTER1=eosdevsrv1.cern.ch
# The fully qualified hostname of MGM master2
EOS_MGM_MASTER2=eosdevsrv2.cern.ch
# The alias which selects master 1 or 2
EOS_MGM_ALIAS=eosdev.cern.ch
Such configuration with two (or more) masters is required for providing HA (high availability) mode. I would guess that in theory with QuarkDB it can be possible, but I didn’t test it. There are some rumors about HA mode in EOS, but I saw no any documentation.