Managing failover of MGM

Hi Dan,

We have a new EOS setup, 4.8.4 here in Vienna.
We’re running a 3 node mgm+quarkdb. So far the setup is quite stable.
For xrdcp etc. they will get redirected to the current master mgm as expected.
So for most connections we can use a DNS round-robin entry, that points to all 3 mgms.

However, the fusex client get’s an “too many redirects” error, when pointed to that RR entry.
Therefore we have a “master.eos.example.com” That points to the “current” mgm master.
It just so happened that we had a mgm failover during the weekend, and we temporarily lost all fusex mounts, as the DNS record was not updated automatically.
This morning I did a “ns master mgm-1.eos.example.com” to bring the mgm master back to the machine that is pointed to by the “master” DNS record.
Fusex mounts immediately started to be functional again.

I don’t consider this a real solution. Imho the fusex client should be able to do service discovery from the DNS RR correctly. The setup loses a lot of ease of operation, if you have to manually flip a DNS record on a HA failover.

Best,
Erich