Actually, this line that you mention is coming very quick, like 1s after start :
221004 12:53:52 29813 Starting on Linux 3.10.0-1160.76.1.el7.x86_64
221004 12:53:53 time=1664880833.280162 func=Supervisor level=INFO logid=d58172d6-43d2-11ed-ab6a-62cb3bd524b1 unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f61337fd700 source=QdbMaster:268 tident=<service> sec= uid=0 gid=0 name= geo="" old_is_master=false, is_master=true, old_master_id=, master_id=s-jrciprcids90v.cidsn.jrc.it:1094
However, the moment when the mgm starts answering queries is long after that, I identified this line as the answer to the eos ns
command, so 16 seconds after :
221004 12:54:09 time=1664880849.128100 func=open level=INFO logid=df8ccb90-43d2-11ed-b68d-62cb3bd524b1 unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f6170407700 source=XrdMgmOfsFile:502 tident=root.30231:389@localhost.localdomain sec=sss uid=0 gid=0 name=daemon geo="JRC" op=read path=/proc/admin/ info=eos.rgid=0&eos.ruid=0&mgm.cmd.proto=GgQKAigB+AEB
I noticed these few things
- there are 2 moments when no messages are written during quite a long time (5 then 4 seconds) meaning that the MGM is not doing anything :
221004 12:53:58 time=1664880838.431826 func=Supervisor level=INFO logid=d58172d6-43d2-11ed-ab6a-62cb3bd524b1 unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f61337fd700 source=QdbMaster:268 tident=<service> sec= uid=0 gid=0 name= geo="" old_is_master=true, is_master=true, old_master_id=s-jrciprcids90v.cidsn.jrc.it:1094, master_id=s-jrciprcids90v.cidsn.jrc.it:1094
221004 12:54:03 time=1664880843.367706 func=UpdateTransferList level=INFO logid=static.............................. unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f60e4bfe700 source=GroupBalancer:101 tident= sec=(null) uid=99 gid=99 name=- geo="" msg="group_balancer update transfers" scheduledtransfers=0
221004 12:54:04 time=1664880844.404425 func=RepairErrs level=INFO logid=d5772902-43d2-11ed-ab6a-62cb3bd524b1 unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f60dc7fc700 source=Fsck:481 tident=<service> sec= uid=0 gid=0 name= geo="" msg="loop in fsck repair thread"
221004 12:54:08 time=1664880848.450424 func=AcquireLease level=INFO logid=d58172d6-43d2-11ed-ab6a-62cb3bd524b1 unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f61337fd700 source=QdbMaster:506 tident=<service> sec= uid=0 gid=0 name= geo="" msg="qclient acquire lease call took 6ms"
- Few seconds before the MGM is unblocked, there are these messages :
++++++ (c) 2020 CERN/IT-ST EOS MGM Authorization (XrdMgmAuthz) 4.8.91
------ XrdMgmAuthz plugin initialization successful
Config warning: asynchronous I/O has been disabled!
Config warning: sendfile I/O has been disabled!
Config warning: 'xrootd.prepare logdir' not specified; prepare tracking disabled.
------ xrootd protocol initialization completed.
------ xrootd mgm@s-jrciprcids90v.cidsn.jrc.it:1094 initialization completed.
221004 12:54:09 29819 XrootdXeq: ?:386@s-jrciprcids94v disc 0:00:00
221004 12:54:09 time=1664880849.096161 func=Configure level=WARN logid=static.............................. unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f6174290780 source=XrdMgmOfsConfigure:2351 tident= sec=(null) uid=99 gid=99 name=- geo="" write lock [ eosView ] held for 15000 milliseconds
221004 12:54:09 time=1664880849.096195 func=Configure level=INFO logid=static.............................. unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f6174290780 source=XrdMgmOfsConfigure:2355 tident= sec=(null) uid=99 gid=99 name=- geo="" msg="after mutex test
221004 12:54:09 time=1664880849.098461 func=main level=WARN logid=static.............................. unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f61307f7700 source=MutexLatencyWatcher:61 tident= sec=(null) uid=99 gid=99 name=- geo="" acquisition of mutex eosViewRWMutex took 14833 milliseconds
221004 12:54:09 time=1664880849.102495 func=backgroundThread level=INFO logid=static.............................. unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f60cb3ff700 source=BulkRequestProcCleaner:71 tident= sec=(null) uid=99 gid=99 name=- geo="" msg="BulkRequestProcCleaner did one round of cleaning, nbDeletedBulkRequests=0"
221004 12:54:09 29820 XrootdXeq: root.30231:389@localhost.localdomain pvt IPv4 login as daemon
221004 12:54:09 29819 XrootdXeq: root.30240:387@localhost.localdomain pvt IPv4 login as daemon
221004 12:54:09 time=1664880849.128017 func=IdMap level=INFO logid=static.............................. unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f6170407700 source=Mapping:1073 tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=sss sec.name="daemon" sec.host="localhost.localdomain" sec.vorg="" sec.grps="daemon" sec.role="" sec.info="" sec.app="" sec.tident="root.30231:389@localhost.localdomain" vid.uid=0 vid.gid=0
The authentication plugin which is loaded a bit late, and this message about the mutex held for 15 seconds.
- there are these error messages, we might have errors in our configuration, but I can’t identify which ones
221004 12:54:04 time=1664880844.097849 func=get level=ERROR logid=d699f21a-43d2-11ed-ab6a-62cb3bd524b1 unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f60bbbe0700 source=ConfigStore:73 tident=<service> sec= uid=0 gid=0 name= geo="" msg="Failed to load key from Configstore" err="msg=Failed Numeric conversion" key= error_msg=Invalid argument
221004 12:54:04 time=1664880844.097893 func=get level=ERROR logid=d699f21a-43d2-11ed-ab6a-62cb3bd524b1 unit=mgm@s-jrciprcids90v.cidsn.jrc.it:1094 tid=00007f60bbbe0700 source=ConfigStore:73 tident=<service> sec= uid=0 gid=0 name= geo="" msg="Failed to load key from Configstore" err="msg=Failed Numeric conversion" key= error_msg=Invalid argument
Maybe some of these are good leads.
Anyway I attach you the full log of starting phase so that you can maybe compare it to what it should be.