CERN Accelerating science

Mgm won't start

Hello,

mgm node just won’t start. However, Fst & mq run smoothly. The logs:

210917 11:40:04 time=1631871604.859839 func=GetLog                   level=ERROR logid=3cb3e9a8-179b-11ec-87d4-021169072178 unit=mgm@zitosm-devel02.d.de:1094 tid=00007fd617652780 source=Master:2366                    tident=<service> sec=      uid=0 gid=0 name= geo="" error: corruption in file changelog at offset 18f10

210917 11:40:04 time=1631871604.859857 func=BootNamespace            level=CRIT  logid=3cb3e9a8-179b-11ec-87d4-021169072178 unit=mgm@zitosm-devel02.d.de:1094 tid=00007fd617652780 source=Master:1909                    tident=<service> sec=      uid=0 gid=0 name= geo="" eos view initialization failed after 0 seconds
210917 11:40:04 time=1631871604.859868 func=BootNamespace            level=CRIT  logid=3cb3e9a8-179b-11ec-87d4-021169072178 unit=mgm@zitosm-devel02.d.de:1094 tid=00007fd617652780 source=Master:1912                    tident=<service> sec=      uid=0 gid=0 name= geo="" initialization returned ec=5 error: Changelog file has corruption - autorepair is disabled
210917 11:40:04 time=1631871604.859877 func=Configure                level=CRIT  logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@zitosm-devel02.d.de:1094 tid=00007fd617652780 source=XrdMgmOfsConfigure:1700        tident=<single-exec> sec=      uid=0 gid=0 name= geo="" msg="namespace boot failed"
210917 11:40:04 10625 XrootdConfig: Unable to create file system object via libXrdEosMgm.so
210917 11:40:04 10625 XrootdConfig: Unable to load file system.

How can I go around this?

eos packages installed:

eos-xrootd-4.12.8-1.el7.cern.x86_64
eos-protobuf3-3.5.1-5.el7.cern.eos.x86_64
libmicrohttpd-0.9.38-eos.yves.el7.cern.x86_64
eos-client-4.8.62-1.el7.cern.x86_64
eos-folly-2019.11.11.00-1.el7.cern.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
eos-server-4.8.62-1.el7.cern.x86_64

cheers

Are you still running the in-memory namespace? In other words, you are not running quarkdb?

If I recall correctly, the way to fix this is to make a backup copy of the namespace files (where mgmfqdn is the FQDN of the MGM:

directories.<mgmfqdn>.mdlog
files.<mgmfqdn>.mdlog

After you have a backup, you need to run:

eos-log-repair <old-file> <new-file>

So, if your mgm hostname is “mgm1.my.domain”:

cp directories.mgm1.my.domain.mdlog directories.mgm1.my.domain.mdlog.bkup
cp files.mgm1.my.domain.mdlog files.mgm1.my.domain.mdlog.bkup

mv directories.mgm1.my.domain.mdlog directories.mgm1.my.domain.mdlog.old
mv files.mgm1.my.domain.mdlog files.mgm1.my.domain.mdlog.old

eos-log-repair directories.mgm1.my.domain.mdlog.old directories.mgm1.my.domain.mdlog
eos-log-repair files.mgm1.my.domain.mdlog.old files.mgm1.my.domain.mdlog

Then try to start the MGM again.

Strictly speaking you don’t NEED to make a backup copy, but I was always very cautious with the namespace files.

Hope that helps.

1 Like

Thanks mate! Fixed it. Cheers!