Dear Experts,
suddenly our MGM stopped to work, and I would like to ask some guidance. It fails to restart with the following, repeating messages:
230201 09:17:53 17084 Starting on Linux 3.10.0-1160.6.1.el7.x86_64
Copr. 2004-2012 Stanford University, xrd version v4.12.8
++++++ xrootd mgm@eos-mgm1.alice-af.wigner.hu initialization started.
Config using configuration file /etc/xrd.cf.mgm
=====> all.sitename ALICE::KFKI::EOS
=====> xrd.sched mint 8 maxt 256 idle 64
=====> xrd.protocol XrdHttp:9000 /usr/lib64/libXrdHttp.so
230201 09:17:53 17084 XrdConfig: sitename already specified, using ' ALICE::KFKI::EOS '.
=====> all.sitename ALICE::KFKI::EOS
Config maximum number of connections restricted to 65000
Plugin loaded
Copr. 2012 Stanford University, xrootd protocol 4.0.0 version v4.12.8
++++++ xrootd protocol initialization started.
=====> xrootd.fslib libXrdEosMgm.so
=====> xrootd.seclib libXrdSec.so
=====> xrootd.async off nosf
=====> xrootd.chksum adler32
=====> all.export / nolock
Config exporting /
Plugin loaded
++++++ Authentication system initialization started.
Plugin loaded
=====> sec.protocol unix
Plugin loaded
=====> sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
Plugin loaded
230201 09:17:53 17084 secgsi_InitOpts: *** ------------------------------------------------------------ ***
230201 09:17:53 17084 secgsi_InitOpts: Mode: server
230201 09:17:53 17084 secgsi_InitOpts: Debug: 0
230201 09:17:53 17084 secgsi_InitOpts: CA dir: /etc/grid-security/certificates/
230201 09:17:53 17084 secgsi_InitOpts: CA verification level: 1
230201 09:17:53 17084 secgsi_InitOpts: CRL dir: /etc/grid-security/certificates/
230201 09:17:53 17084 secgsi_InitOpts: CRL extension: .r0
230201 09:17:53 17084 secgsi_InitOpts: CRL check level: 0
230201 09:17:53 17084 secgsi_InitOpts: Certificate: /etc/grid-security/daemon/hostcert.pem
230201 09:17:53 17084 secgsi_InitOpts: Key: /etc/grid-security/daemon/hostkey.pem
230201 09:17:53 17084 secgsi_InitOpts: Proxy delegation option: 0
230201 09:17:53 17084 secgsi_InitOpts: GRIDmap file: /etc/grid-security/grid-mapfile
230201 09:17:53 17084 secgsi_InitOpts: GRIDmap option: 2
230201 09:17:53 17084 secgsi_InitOpts: GRIDmap cache entries expiration (secs): 600
230201 09:17:53 17084 secgsi_InitOpts: Client proxy availability in XrdSecEntity.endorsement: 0
230201 09:17:53 17084 secgsi_InitOpts: VOMS option: 1
230201 09:17:53 17084 secgsi_InitOpts: MonInfo option: 1
230201 09:17:53 17084 secgsi_InitOpts: Crypto modules: ssl
230201 09:17:53 17084 secgsi_InitOpts: Ciphers: aes-128-cbc:bf-cbc:des-ede3-cbc
230201 09:17:53 17084 secgsi_InitOpts: MDigests: sha1:md5
230201 09:17:53 17084 secgsi_InitOpts: Trusting DNS for hostname checking
230201 09:17:53 17084 secgsi_InitOpts: *** ------------------------------------------------------------ ***
230201 09:17:53 17084 secgsi_GetSrvCertEnt: problems loading srv cert: invalid
230201 09:17:53 17084 secgsi_Init: problems loading srv cert
=====> sec.protocol gsi -crl:0 -cert:/etc/grid-security/daemon/hostcert.pem -key:/etc/grid-security/daemon/hostkey.pem -gridmap:/etc/grid-security/grid-mapfile -d:0 -gmapopt:2 -vomsat:1 -moninfo:1 -exppxy:/var/eos/auth/gsi#<uid>
=====> sec.protbind localhost.localdomain unix sss
=====> sec.protbind localhost unix sss
=====> sec.protbind * only sss unix
Config 6 authentication directives processed in /etc/xrd.cf.mgm
------ Authentication system initialization completed.
++++++ Protection system initialization started.
Config warning: Security level is set to none; request protection disabled!
Config Local protection level: none
Config Remote protection level: none
------ Protection system initialization completed.
Config Routing for 172.16.152.16: local pub4 prv4
Config Route all4: 172.16.152.16 Dest=[::172.16.152.16]:1094
Plugin loaded
++++++ (c) 2015 CERN/IT-DSS MgmOfs (meta data redirector) 4.8.62
=====> mgmofs enforces SSS authentication for XROOT clients
jemalloc is loaded!
jemalloc heap profiling is disabled
=====> mgmofs.hostname: eos-mgm1.alice-af.wigner.hu
=====> mgmofs.hostpref: eos-mgm1
=====> mgmofs.managerid: eos-mgm1.alice-af.wigner.hu:1094
=====> mgmofs.fs: /
=====> mgmofs.targetport: 1095
=====> mgmofs.authlib : /usr/lib64/libXrdAliceTokenAcc.so
=====> mgmofs.authorize : true
=====> mgmofs.instance : eosalice
=====> mgmofs.metalog: /var/eos/md
=====> mgmofs.txdir: /var/eos/tx
=====> mgmofs.authdir: /var/eos/auth
=====> mgmofs.reportstorepath: /var/eos/report
=====> mgmofs.cfgtype: quarkdb
=====> mgmofs.nslib : /usr/lib64/libEosNsQuarkdb.so
=====> mgmofs.qdbcluster : localhost:7001 localhost:7002 localhost:7003
=====> mgmofs.qdbpassword length : 89
=====> ofs.tpc redirect to: eos-gateway-node.cern.ch1094
=====> mgmofs.redirector : false
=====> mgmofs.broker : root://localhost:1097//eos/eos-mgm1.alice-af.wigner.hu/mgm
=====> mgmofs.defaultreceiverqueue : /eos/*/fst
=====> mgmofs.fs: /
=====> mgmofs.errorlog : enabled
++++++ (c) 2008 CERN/IT-DM-SMD AliceTokenAcc (Alice Token Access Authorization) v 1.0
=====> alicetokenacc.noauthzhost: localhost
=====> alicetokenacc.noauthzhost: localhost.localdomain
=====> alicetokenacc.truncateprefix: /eos/alice/grid
=====> XrdAliceTokenAcc: No Authorizationfile set via environment variable 'TTOKENAUTHZ_AUTHORIZATIONFILE'
=====> XrdAliceTokenAcc: Using Authorizationfile '/etc/grid-security/xrootd/TkAuthz.Authorization'!
------ AliceTokenAcc initialization completed
=====> all.role: manager
=====> setting message filter: Process,AddQuota,Update,UpdateHint,Deletion,PrintOut,SharedHash,work
=====> comment log in /var/log/eos/mgm/logbook.log
=====> eosxd stacktraces log in /var/log/eos/mgm/eosxd-stacktraces.log
=====> eosxd logtraces log in /var/log/eos/mgm/eosxd-logtraces.log
=====> mgmofs.alias: eos-mgm.alice-af.wigner.hu
230201 09:17:53 time=1675243073.796342 func=Configure level=NOTE logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=XrdMgmOfsConfigure:1540 tident=<single-exec> sec= uid=0 gid=0 name= geo="" MGM_HOST=eos-mgm1.alice-af.wigner.hu MGM_PORT=1094 VERSION=4.8.62 RELEASE=1 KEYTABADLER=deba251a SYMKEY=I1HQvI4qbzhCNdw464x2Jf6vPRk=
230201 09:17:53 time=1675243073.798220 func=set level=INFO logid=static.............................. unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=InstanceName:39 tident= sec=(null) uid=99 gid=99 name=- geo="" Setting global instance name => eosalice
230201 09:17:53 time=1675243073.798231 func=Supervisor level=NOTE logid=4ec3ff0a-a211-11ed-96ab-00259074c8e8 unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f5ff23fc700 source=QdbMaster:238 tident=<service> sec= uid=0 gid=0 name= geo="" msg="set up booting stall rule"
230201 09:17:53 time=1675243073.798406 func=AddBroker level=INFO logid=static.............................. unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=XrdMqClient:179 tident= sec=(null) uid=99 gid=99 name=- geo="" msg="add broker" url="root://localhost:1097//eos/eos-mgm1.alice-af.wigner.hu/mgm?xmqclient.advisory.status=1&xmqclient.advisory.query=1&xmqclient.advisory.flushbacklog=1"
###### mq messaging: starting thread
230201 09:17:53 time=1675243073.803418 func=Subscribe level=INFO logid=static.............................. unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=XrdMqClient:605 tident= sec=(null) uid=99 gid=99 name=- geo="" msg="successfully subscribed to broker" url="root://localhost:1097//eos/eos-mgm1.alice-af.wigner.hu/mgm?xmqclient.advisory.status=1&xmqclient.advisory.query=1&xmqclient.advisory.flushbacklog=1"
230201 09:17:53 time=1675243073.907452 func=CreateObject level=INFO logid=4ebf30ce-a211-11ed-96ab-00259074c8e8 unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=PluginManager:287 tident=<service> sec= uid=0 gid=0 name= geo="" created plugin object type=NamespaceGroup
230201 09:17:53 time=1675243073.917964 func=enforceQuarkDBVersion level=INFO logid=static.............................. unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=VersionEnforcement:38 tident= sec=(null) uid=99 gid=99 name=- geo="" QuarkDB version: "0.4.2"
230201 09:17:54 time=1675243074.097625 func=synchronize level=INFO logid=static.............................. unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=MetadataFlusher:157 tident= sec=(null) uid=99 gid=99 name=- geo="" starting-index=2311658799 ending-index=2311658799 msg="waiting until queue item 2311658798 has been acknowledged.."
230201 09:17:54 time=1675243074.097647 func=synchronize level=INFO logid=static.............................. unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=MetadataFlusher:170 tident= sec=(null) uid=99 gid=99 name=- geo="" starting-index=2311658799 ending-index=2311658799 msg="queue item 2311658798 has been acknowledged"
230201 09:17:54 time=1675243074.224703 func=synchronize level=INFO logid=static.............................. unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=MetadataFlusher:157 tident= sec=(null) uid=99 gid=99 name=- geo="" starting-index=359094629 ending-index=359094629 msg="waiting until queue item 359094628 has been acknowledged.."
230201 09:17:54 time=1675243074.224736 func=synchronize level=INFO logid=static.............................. unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=MetadataFlusher:170 tident= sec=(null) uid=99 gid=99 name=- geo="" starting-index=359094629 ending-index=359094629 msg="queue item 359094628 has been acknowledged"
[QCLIENT - INFO - getNext:57] Received redirection to localhost:7003
230201 09:17:54 time=1675243074.261859 func=configure level=INFO logid=static.............................. unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=FileSystemView:58 tident= sec=(null) uid=99 gid=99 name=- geo="" msg="FileSystemView loadFromBackend" duration=0s
230201 09:17:54 17084 XrootdConfig: Unable to create file system object via libXrdEosMgm.so
230201 09:17:54 17084 XrootdConfig: Unable to load file system.
------ xrootd protocol initialization failed.
230201 09:17:54 time=1675243074.278885 func=BootNamespace level=NOTE logid=4ec3ff0a-a211-11ed-96ab-00259074c8e8 unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=QdbMaster:151 tident=<service> sec= uid=0 gid=0 name= geo="" msg="container initialization failed" duration=0s, errc=17, reason="SafetyCheck FATAL: Risk of data loss, found container (41564) with id bigger than max container id (41563)"
230201 09:17:54 17084 XrdProtocol: Protocol xrootd could not be loaded
230201 09:17:54 time=1675243074.278923 func=Configure level=CRIT logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@eos-mgm1.alice-af.wigner.hu:1094 tid=00007f6035b26780 source=XrdMgmOfsConfigure:1700 tident=<single-exec> sec= uid=0 gid=0 name= geo="" msg="namespace boot failed"
------ xrootd mgm@eos-mgm1.alice-af.wigner.hu:-1 initialization failed.
I’m not really sure what containers does it refer. The quarkdb seems healthy.
Thanks,
Gabor