Not successful in migration from aquamarine to citrine in SLC6.10 (msg="waiting to know manager")

Dear All,
We were running one EOS ALICE Instance comprising one MGM and 3 FST (Total 6 groups in RAIN-6 configuration. each FST contain 12 mountpoint). It is running in SLC6.10 with Aquamarine.

We upgraded to Citrine without changing the OS. But after Citrine facing problem of FST and MGM communication. EOS MGM service is running but at FST service stopped after few seconds. FST tries to contact MGM and if is not getting proper reply from MGM therefore service stopped.

We checked networking between MGM and FST, every thing looks ok. (No firewall)

Configuration and log files for MGM is as below:-

[root@eos grid-security]# cat /etc/sysconfig/eos_env
test -e /usr/lib64/libjemalloc.so.1 && LD_PRELOAD=/usr/lib64/libjemalloc.so.1
DAEMON_COREFILE_LIMIT=unlimited
XRD_ROLES=“mq mgm”
EOS_INSTANCE_NAME=eoskolkataalice
EOS_BROKER_URL=root://localhost:1097//eos/
EOS_MGM_MASTER1=eos.tier2-kol.res.in
EOS_MGM_MASTER2=eos.tier2-kol.res.in
EOS_MGM_ALIAS=eos.tier2-kol.res.in
EOS_FUSE_MGM_ALIAS=eos.tier2-kol.res.in
EOS_MAIL_CC="vikasssinghal@gmail.com"
EOS_NOTIFY=“mail -s date +%s-hostname-eos-notify $EOS_MAIL_CC”
EOS_HTTP_THREADPOOL=“epoll”
EOS_HTTP_THREADPOOL_SIZE=32
EOS_HTTP_CONNECTION_MEMORY_LIMIT=65536
EOS_GEOTAG=‘tier2-kol’
EOS_MGM_HOST=eos.tier2-kol.res.in
#EOS_MGM_HOST_TARGET=eos.tier2-kol.res.in

EOS_TTY_BROADCAST_LISTEN_LOGFILE="/var/log/eos/mgm/xrdlog.mgm"
EOS_TTY_BROACAST_EGREP="“CRIT|ALERT|EMERG|PROGRESS”"
[root@eos grid-security]# service eos restart
Stopping xrootd: mq [ OK ]
Stopping xrootd: mgm [ OK ]

Starting xrootd as mq with -n mq -c /etc/xrd.cf.mq -l /var/log/eos/xrdlog.mq -b -Rdaemon
[ OK ]
Starting xrootd as mgm with -n mgm -c /etc/xrd.cf.mgm -m -l /var/log/eos/xrdlog.mgm -b -Rdaemon
[ OK ]
[root@eos grid-security]# tail -f /var/log/eos/mgm/xrdlog.mgm
PROGRESS [ scan files.eos.tier2-kol.res.in.mdlog ] 86% estimate 3.9s [ 23s/27s ]
PROGRESS [ scan files.eos.tier2-kol.res.in.mdlog ] 88% estimate 3.3s [ 23s/26s ]
PROGRESS [ scan files.eos.tier2-kol.res.in.mdlog ] 90% estimate 2.8s [ 24s/27s ]
PROGRESS [ scan files.eos.tier2-kol.res.in.mdlog ] 92% estimate 2.3s [ 25s/27s ]
PROGRESS [ scan files.eos.tier2-kol.res.in.mdlog ] 94% estimate 1.7s [ 25s/27s ]
PROGRESS [ scan files.eos.tier2-kol.res.in.mdlog ] 96% estimate 1.1s [ 26s/27s ]
PROGRESS [ scan files.eos.tier2-kol.res.in.mdlog ] 98% estimate 0.6s [ 26s/27s ]
INFO [ found file compaction mark at offset=6868054564 ]
INFO [ found file compaction mark at offset=6984137540 ]
ALERT [ files.eos.tier2-kol.res.in.mdlog ] finished in 27s
181114 21:48:45 time=1542212325.598744 func=InitializeFileView level=NOTE logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f001679f700 source=XrdMgmOfsConfigure:127 tident= sec= uid=0 gid=0 name= geo="" eos file view after initialize2
181114 21:48:45 time=1542212325.598862 func=InitializeFileView level=NOTE logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f001679f700 source=XrdMgmOfsConfigure:129 tident= sec= uid=0 gid=0 name= geo="" starting eos file view initialize3
PROGRESS [ scan file-visit ] 00% estimate none
PROGRESS [ scan file-visit ] 10% estimate 9.0s [ 0s/9s ] [185616/1856159]
PROGRESS [ scan file-visit ] 20% estimate 4.0s [ 0s/4s ] [371232/1856159]
PROGRESS [ scan file-visit ] 30% estimate 2.3s [ 0s/2s ] [556848/1856159]
PROGRESS [ scan file-visit ] 40% estimate 1.5s [ 0s/2s ] [742464/1856159]
PROGRESS [ scan file-visit ] 50% estimate 2.0s [ 1s/3s ] [928080/1856159]
PROGRESS [ scan file-visit ] 60% estimate 1.3s [ 1s/2s ] [1113696/1856159]
PROGRESS [ scan file-visit ] 70% estimate 0.9s [ 1s/2s ] [1299312/1856159]
PROGRESS [ scan file-visit ] 80% estimate 0.5s [ 1s/2s ] [1484928/1856159]
PROGRESS [ scan file-visit ] 90% estimate 0.2s [ 1s/1s ] [1670544/1856159]
ALERT [ file-visit ] finnished in 1s
181114 21:48:46 time=1542212326.427828 func=InitializeFileView level=NOTE logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f001679f700 source=XrdMgmOfsConfigure:132 tident= sec= uid=0 gid=0 name= geo="" eos file view initialize2: 36 seconds
181114 21:48:46 time=1542212326.427892 func=InitializeFileView level=NOTE logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f001679f700 source=XrdMgmOfsConfigure:133 tident= sec= uid=0 gid=0 name= geo="" eos file view initialize3: 1 seconds
181114 21:48:46 time=1542212326.428064 func=InitializeFileView level=ALERT logid=static… unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f001679f700 source=XrdMgmOfsConfigure:220 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“namespace booted (as master)”
181114 21:48:46 time=1542212326.428113 func=InitializeFileView level=NOTE logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f001679f700 source=XrdMgmOfsConfigure:276 tident= sec= uid=0 gid=0 name= geo="" eos namespace file loading stopped after 37 seconds
181114 21:48:48 time=1542212328.387495 func=Balance level=INFO logid=static… unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f0098110700 source=Balancer:104 tident= sec=(null) uid=99 gid=99 name=- geo="" Looping in balancer
181114 21:48:50 2631 MgmOfs_SendMessage: Unable to Unable to submit message - no listener on requested queue: /eos/*/fst; Invalid argument; unknown error 3005
181114 21:48:51 3430 XrootdXeq: sgmali23.47865:126@sampawn020.if.usp.br pub IPv4 login as sgmali23
181114 21:48:51 3431 XrootdXeq: User authentication failed; Decryption key not found.
181114 21:48:51 time=1542212331.949864 func=IdMap level=INFO logid=static… unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb3f3700 source=Mapping:883 tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=unix sec.name=“sgmali23” sec.host=“sampawn020.if.usp.br” sec.vorg="" sec.grps=“alicesgm” sec.role="" sec.info="" sec.app="" sec.tident=“sgmali23.47865:126@sampawn020.if.usp.br”
181114 21:48:51 time=1542212331.950049 func=open level=INFO logid=f9a53fee-e828-11e8-8e99-80c16eaacee4 unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb3f3700 source=XrdMgmOfsFile:190 tident=sgmali23.47865:126@sampawn020.if.usp.br sec=unix uid=10367 gid=1395 name=sgmali23 geo="" op=read path=/eos/kolkataalice/grid/01/48559/b4b6709c-5b52-11e7-8b0c-8769c4afe1f3 info=&authz=<…>
181114 21:48:51 time=1542212331.958155 func=open level=INFO logid=f9a53fee-e828-11e8-8e99-80c16eaacee4 unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb3f3700 source=XrdMgmOfsFile:637 tident=sgmali23.47865:126@sampawn020.if.usp.br sec=unix uid=10367 gid=1395 name=sgmali23 geo="" acl=0 r=0 w=0 wo=0 egroup=0 shared=0 mutable=1
181114 21:48:51 time=1542212331.959555 func=Emsg level=ERROR logid=f9a53fee-e828-11e8-8e99-80c16eaacee4 unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb3f3700 source=XrdMgmOfsFile:2874 tident=sgmali23.47865:126@sampawn020.if.usp.br sec=unix uid=10367 gid=1395 name=sgmali23 geo="" Unable to open file /eos/kolkataalice/grid/01/48559/b4b6709c-5b52-11e7-8b0c-8769c4afe1f3; Network is unreachable
181114 21:48:52 3431 XrootdXeq: monalisa.64624:128@pcalimonitor.cern.ch pub IP46 login as monalisa
181114 21:48:52 time=1542212332.406681 func=IdMap level=INFO logid=static… unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb2f2700 source=Mapping:883 tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=unix sec.name=“monalisa” sec.host=“pcalimonitor.cern.ch” sec.vorg="" sec.grps=“alienmaster” sec.role="" sec.info="" sec.app="" sec.tident=“monalisa.64624:128@pcalimonitor.cern.ch”
181114 21:48:52 time=1542212332.406834 func=open level=INFO logid=f9eb081c-e828-11e8-8965-80c16eaacee4 unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb2f2700 source=XrdMgmOfsFile:188 tident=monalisa.64624:128@pcalimonitor.cern.ch sec=unix uid=10367 gid=1395 name=monalisa geo="" op=write trunc=512 path=/eos/kolkataalice/grid/11/59058/f9397bb0-e828-11e8-b791-0242266c4d21 info=authz=<…>&oss.asize=10459447
181114 21:48:52 time=1542212332.406898 func=ShouldStall level=INFO logid=static… unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb2f2700 source=ShouldStall:165 tident= sec=(null) uid=99 gid=99 name=- geo="" info=“stalling access to” uid=10367 gid=1395 host=pcalimonitor.cern.ch
181114 21:48:52 3431 XrootdXeq: monalisa.64624:128@pcalimonitor.cern.ch disc 0:00:01
181114 21:48:53 time=1542212333.247870 func=IdMap level=INFO logid=static… unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb3f3700 source=Mapping:883 tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=unix sec.name=“sgmali23” sec.host=“sampawn020.if.usp.br” sec.vorg="" sec.grps=“alicesgm” sec.role="" sec.info="" sec.app="" sec.tident=“sgmali23.47865:126@sampawn020.if.usp.br”
181114 21:48:53 time=1542212333.248014 func=open level=INFO logid=fa6b639a-e828-11e8-8e99-80c16eaacee4 unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb3f3700 source=XrdMgmOfsFile:190 tident=sgmali23.47865:126@sampawn020.if.usp.br sec=unix uid=10367 gid=1395 name=sgmali23 geo="" op=read path=/eos/kolkataalice/grid/01/48559/b4b6709c-5b52-11e7-8b0c-8769c4afe1f3 info=&authz=<…>
181114 21:48:53 time=1542212333.249397 func=open level=INFO logid=fa6b639a-e828-11e8-8e99-80c16eaacee4 unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb3f3700 source=XrdMgmOfsFile:637 tident=sgmali23.47865:126@sampawn020.if.usp.br sec=unix uid=10367 gid=1395 name=sgmali23 geo="" acl=0 r=0 w=0 wo=0 egroup=0 shared=0 mutable=1
181114 21:48:53 time=1542212333.249596 func=Emsg level=ERROR logid=fa6b639a-e828-11e8-8e99-80c16eaacee4 unit=mgm@eos.tier2-kol.res.in:1094 tid=00007effdb3f3700 source=XrdMgmOfsFile:2874 tident=sgmali23.47865:126@sampawn020.if.usp.br sec=unix uid=10367 gid=1395 name=sgmali23 geo="" Unable to open file /eos/kolkataalice/grid/01/48559/b4b6709c-5b52-11e7-8b0c-8769c4afe1f3; Network is unreachable
181114 21:48:53 2279 XrootdXeq: User authentication failed; Decryption key not found.
181114 21:48:53 2279 XrootdXeq: monalisa.64636:127@pcalimonitor.cern.ch pub IP46 login as monalisa
^C
[root@eos grid-security]#
[root@eos grid-security]# tail -f /var/log/eos/mgm/error.log
181114 21:36:38 time=1542211598.204054 func=Cleaner level=ERROR logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007f137efff700 source=Cleaner:66 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“don’t know the manager name”
181114 21:36:38 time=1542211598.197498 func=Cleaner level=ERROR logid=static… unit=fst@eos02.tier2-kol.res.in:1095 tid=00007ff5463ff700 source=Cleaner:66 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“don’t know the manager name”
181114 21:36:38 time=1542211598.288962 func=CallManager level=ERROR logid=3e4f8098-e827-11e8-977e-b083fed75e82 unit=fst@eos02.tier2-kol.res.in:1095 tid=00007ff5491fd700 source=XrdFstOfs:857 tident= sec= uid=0 gid=0 name= geo="" error=URL is not valid: root:////dummy?xrd.wantprot=sss
181114 21:36:38 time=1542211598.289054 func=Remover level=ERROR logid=static… unit=fst@eos02.tier2-kol.res.in:1095 tid=00007ff5491fd700 source=Remover:111 tident= sec=(null) uid=99 gid=99 name=- geo="" manager returned errno=22
181114 21:36:38 time=1542211598.197817 func=Cleaner level=ERROR logid=static… unit=fst@eos03.tier2-kol.res.in:1095 tid=00007f2e0a7ff700 source=Cleaner:66 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“don’t know the manager name”
181114 21:36:38 time=1542211598.289132 func=CallManager level=ERROR logid=3e4f6446-e827-11e8-9617-b083fed762ed unit=fst@eos03.tier2-kol.res.in:1095 tid=00007f2e0ddfd700 source=XrdFstOfs:857 tident= sec= uid=0 gid=0 name= geo="" error=URL is not valid: root:////dummy?xrd.wantprot=sss
181114 21:36:38 time=1542211598.204054 func=Cleaner level=ERROR logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007f137efff700 source=Cleaner:66 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“don’t know the manager name”
181114 21:36:38 time=1542211598.289206 func=Remover level=ERROR logid=static… unit=fst@eos03.tier2-kol.res.in:1095 tid=00007f2e0ddfd700 source=Remover:111 tident= sec=(null) uid=99 gid=99 name=- geo="" manager returned errno=22
181114 21:36:38 time=1542211598.295842 func=CallManager level=ERROR logid=3e508448-e827-11e8-bd29-b083fece9f61 unit=fst@eos01.tier2-kol.res.in:1095 tid=00007f13821fd700 source=XrdFstOfs:857 tident= sec= uid=0 gid=0 name= geo="" error=URL is not valid: root:////dummy?xrd.wantprot=sss
181114 21:36:38 time=1542211598.295909 func=Remover level=ERROR logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007f13821fd700 source=Remover:111 tident= sec=(null) uid=99 gid=99 name=- geo="" manager returned errno=22
181114 21:38:13 time=1542211693.199428 func=MgmSyncer level=ALERT logid=static… unit=fst@eos02.tier2-kol.res.in:1095 tid=00007ff5457ff700 source=MgmSyncer:66 tident= sec=(null) uid=99 gid=99 name=- geo="" didn’t receive manager name, aborting
181114 21:38:13 time=1542211693.201400 func=MgmSyncer level=ALERT logid=static… unit=fst@eos03.tier2-kol.res.in:1095 tid=00007f2e0a6fe700 source=MgmSyncer:66 tident= sec=(null) uid=99 gid=99 name=- geo="" didn’t receive manager name, aborting
181114 21:38:13 time=1542211693.209065 func=MgmSyncer level=ALERT logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007f137eefe700 source=MgmSyncer:66 tident= sec=(null) uid=99 gid=99 name=- geo="" didn’t receive manager name, aborting

Below is FST log

===================================================
[root@eos01 ~]# service eos restart
Stopping xrootd: fst [FAILED]

Starting xrootd as fst with -n fst -c /etc/xrd.cf.fst -l /var/log/eos/xrdlog.fst -b -Rdaemon
[ OK ]
[root@eos01 ~]# tail -f /var/log/eos/fst/xrdlog.fst
181114 21:55:53 time=1542212753.102294 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc1077ff700 source=Config:40 tident= sec=(null) uid=99 gid=99 name=- geo="" Waiting for config queue in Balancer …
181114 21:55:53 time=1542212753.103664 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc1076fe700 source=Config:40 tident= sec=(null) uid=99 gid=99 name=- geo="" Waiting for config queue in …
181114 21:55:53 time=1542212753.103965 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106bff700 source=Config:40 tident= sec=(null) uid=99 gid=99 name=- geo="" Waiting for config queue in Cleaner …
181114 21:55:53 time=1542212753.104031 func=Cleaner level=NOTE logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106bff700 source=Cleaner:41 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“cleaning transactions”
181114 21:55:53 time=1542212753.104063 func=Cleaner level=ERROR logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106bff700 source=Cleaner:66 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“don’t know the manager name”
181114 21:55:53 time=1542212753.104427 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:55:53 time=1542212753.195641 func=CallManager level=ERROR logid=eeb03ae8-e829-11e8-b3f6-b083fece9f61 unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc10a1fd700 source=XrdFstOfs:857 tident= sec= uid=0 gid=0 name= geo="" error=URL is not valid: root:////dummy?xrd.wantprot=sss
181114 21:55:53 time=1542212753.195709 func=Remover level=ERROR logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc10a1fd700 source=Remover:111 tident= sec=(null) uid=99 gid=99 name=- geo="" manager returned errno=22
181114 21:55:58 time=1542212758.104596 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:03 time=1542212763.104754 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:08 time=1542212768.104926 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:13 time=1542212773.105068 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:18 time=1542212778.105230 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:23 time=1542212783.105412 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:28 time=1542212788.105600 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:33 time=1542212793.105835 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:38 time=1542212798.105961 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:43 time=1542212803.106126 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:48 time=1542212808.106316 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:53 time=1542212813.106493 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:56:54 time=1542212814.170102 func=Release level=WARN logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc1082fe700 source=RWMutex:1367 tident= sec=(null) uid=99 gid=99 name=- geo="" WARNING - read lock held for 55987 milliseconds by this thread:
backward disabled
181114 21:56:54 time=1542212814.170169 func=Publish level=WARN logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc1082fe700 source=Publish:476 tident= sec=(null) uid=99 gid=99 name=- geo="" Publisher cycle exceeded 12704 millisecons - took 55987 milliseconds
181114 21:56:58 time=1542212818.106774 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc106afe700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181114 21:57:03 time=1542212823.106963 func=MgmSyncer

Kindly suggest accordingly.

Vikas

Hi VIkas,
do the FSTs run on the same machine like the MGM? If not, can you check the /etc/sysconfig/eos has the correct borker URL pointing to the manager and that you have the same /etc/eos.keytab file on FSTs and MGM.
Also, which version is that?

Dear Andreas,
Hardware wise MGM and FST machines are different but OS is same.

[root@eos ~]# lsb_release -a
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: Scientific
Description: Scientific Linux release 6.10 (Carbon)
Release: 6.10
Codename: Carbon
[root@eos ~]# cexec “lsb_release -a”
************************* eos_kolkata_cluster *************************
--------- eos01.tier2-kol.res.in---------
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: Scientific
Description: Scientific Linux release 6.10 (Carbon)
Release: 6.10
Codename: Carbon
--------- eos02.tier2-kol.res.in---------
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: Scientific
Description: Scientific Linux release 6.10 (Carbon)
Release: 6.10
Codename: Carbon
--------- eos03.tier2-kol.res.in---------
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: Scientific
Description: Scientific Linux release 6.10 (Carbon)
Release: 6.10
Codename: Carbon
[root@eos ~]#

EOS Config file is same for FST and EOS. FST /etc/sysconfig/eos file is as below:-

[root@eos ~]# cexec “cat /etc/sysconfig/eos”
************************* eos_kolkata_cluster *************************
--------- eos01.tier2-kol.res.in---------
test -e /usr/lib64/libjemalloc.so.1 && export LD_PRELOAD=/usr/lib64/libjemalloc.so.1
XRD_ROLES=“fst”
export EOS_BROKER_URL=root://eos.tier2-kol.res.in:1097//eos/
export EOS_MGM_ALIAS=eos.tier2-kol.res.in
export EOS_HTTP_THREADPOOL=epoll
export EOS_HTTP_THREADPOOL=epoll
export EOS_HTTP_THREADPOOL_SIZE=32
export EOS_HTTP_CONNECTION_MEMORY_LIMIT=65536
export EOS_HTTP_CONNECTION_MEMORY_LIMIT=65536
export EOS_MGM_ALIAS=eos.tier2-kol.res.in
export APMON_INSTANCE_NAME=ALICE::KOLKATA::EOS
export MONALISAHOST=grid01.tier2-kol.res.in
export APMON_STORAGEPATH=edata
--------- eos02.tier2-kol.res.in---------
test -e /usr/lib64/libjemalloc.so.1 && export LD_PRELOAD=/usr/lib64/libjemalloc.so.1
XRD_ROLES=“fst”
export EOS_BROKER_URL=root://eos.tier2-kol.res.in:1097//eos/
export EOS_MGM_ALIAS=eos.tier2-kol.res.in
export EOS_HTTP_THREADPOOL=epoll
export EOS_HTTP_THREADPOOL=epoll
export EOS_HTTP_THREADPOOL_SIZE=32
export EOS_HTTP_CONNECTION_MEMORY_LIMIT=65536
export EOS_HTTP_CONNECTION_MEMORY_LIMIT=65536
export EOS_MGM_ALIAS=eos.tier2-kol.res.in
export APMON_INSTANCE_NAME=ALICE::KOLKATA::EOS
export MONALISAHOST=grid01.tier2-kol.res.in
export APMON_STORAGEPATH=edata
--------- eos03.tier2-kol.res.in---------
test -e /usr/lib64/libjemalloc.so.1 && export LD_PRELOAD=/usr/lib64/libjemalloc.so.1
XRD_ROLES=“fst”
export EOS_BROKER_URL=root://eos.tier2-kol.res.in:1097//eos/
export EOS_MGM_ALIAS=eos.tier2-kol.res.in
export EOS_HTTP_THREADPOOL=epoll
export EOS_HTTP_THREADPOOL=epoll
export EOS_HTTP_THREADPOOL_SIZE=32
export EOS_HTTP_CONNECTION_MEMORY_LIMIT=65536
export EOS_HTTP_CONNECTION_MEMORY_LIMIT=65536
export EOS_MGM_ALIAS=eos.tier2-kol.res.in
export APMON_INSTANCE_NAME=ALICE::KOLKATA::EOS
export MONALISAHOST=grid01.tier2-kol.res.in
export APMON_STORAGEPATH=edata
[root@eos ~]#

Keytab was also same for MGM and FST. Once again created new keytab and made same.

[root@eos ~]# cat /etc/eos.keytab
0 u:daemon g:daemon n:eos N:6623934160027254785 c:1542254854 e:0 f:0 k:58ceff805305cbfabbe7496d1b8211fd7a4929b4462660c4447afd5e541e8dfa[root@eos ~]#
[root@eos ~]#
[root@eos ~]#
[root@eos ~]# cexec “cat /etc/eos.keytab”
************************* eos_kolkata_cluster *************************
--------- eos01.tier2-kol.res.in---------
0 u:daemon g:daemon n:eos N:6623934160027254785 c:1542254854 e:0 f:0 k:58ceff805305cbfabbe7496d1b8211fd7a4929b4462660c4447afd5e541e8dfa--------- eos02.tier2-kol.res.in---------
0 u:daemon g:daemon n:eos N:6623934160027254785 c:1542254854 e:0 f:0 k:58ceff805305cbfabbe7496d1b8211fd7a4929b4462660c4447afd5e541e8dfa--------- eos03.tier2-kol.res.in---------
0 u:daemon g:daemon n:eos N:6623934160027254785 c:1542254854 e:0 f:0 k:58ceff805305cbfabbe7496d1b8211fd7a4929b4462660c4447afd5e541e8dfa[root@eos ~]#
[root@eos ~]# ls -l /etc/eos.keytab
-r-------- 1 daemon daemon 135 Nov 15 09:37 /etc/eos.keytab
[root@eos ~]#
[root@eos ~]# cexec "ls -l /etc/eos.keytab "
************************* eos_kolkata_cluster *************************
--------- eos01.tier2-kol.res.in---------
-r-------- 1 daemon daemon 135 Nov 15 09:37 /etc/eos.keytab
--------- eos02.tier2-kol.res.in---------
-r-------- 1 daemon daemon 135 Nov 15 09:37 /etc/eos.keytab
--------- eos03.tier2-kol.res.in---------
-r-------- 1 daemon daemon 135 Nov 15 09:37 /etc/eos.keytab
[root@eos ~]#

EOS Citrine version is same on all MGM and FSTs which is 4.3.12

[root@eos ~]# eos -v
EOS 4.3.12 (CERN)
Written by CERN-IT-DSS (Andreas-Joachim Peters, Lukasz Janyst & Elvin Sindrilaru)
[root@eos ~]# cexec “eos -v”
************************* eos_kolkata_cluster *************************
--------- eos01.tier2-kol.res.in---------
EOS 4.3.12 (CERN)
Written by CERN-IT-DSS (Andreas-Joachim Peters, Lukasz Janyst & Elvin Sindrilaru)
--------- eos02.tier2-kol.res.in---------
EOS 4.3.12 (CERN)
Written by CERN-IT-DSS (Andreas-Joachim Peters, Lukasz Janyst & Elvin Sindrilaru)
--------- eos03.tier2-kol.res.in---------
EOS 4.3.12 (CERN)
Written by CERN-IT-DSS (Andreas-Joachim Peters, Lukasz Janyst & Elvin Sindrilaru)
[root@eos ~]#

Same error and same log output as earlier:-
Output for xrdlog.mgm

181115 12:18:52 time=1542264532.141551 func=ShouldStall level=INFO logid=static… unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f5233efe700 source=ShouldStall:165 tident= sec=(null) uid=99 gid=99 name=- geo="" info=“stalling access to” uid=10367 gid=1395 host=pcaliendb06a.cern.ch
181115 12:18:52 time=1542264532.151835 func=IdMap level=INFO logid=static… unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f5233dfd700 source=Mapping:883 tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=unix sec.name=“alienmaster” sec.host=“pcaliendb06a.cern.ch” sec.vorg="" sec.grps=“alienmaster” sec.role="" sec.info="" sec.app="" sec.tident=“alienmas.2386:127@pcaliendb06a.cern.ch”
181115 12:18:52 time=1542264532.151941 func=ShouldStall level=INFO logid=static… unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f5233dfd700 source=ShouldStall:165 tident= sec=(null) uid=99 gid=99 name=- geo="" info=“stalling access to” uid=10367 gid=1395 host=pcaliendb06a.cern.ch
181115 12:18:53 25233 MgmOfs_SendMessage: Unable to Unable to submit message - no listener on requested queue: /eos/*/fst; Invalid argument; unknown error 3005
181115 12:19:00 time=1542264540.647776 func=_open level=INFO logid=88746a1e-e8a2-11e8-9589-80c16eaacee0 unit=mgm@eos.tier2-kol.res.in:1094 tid=00007f528383b700 source=XrdMgmOfsDirectory:164 tident= sec=local uid=0 gid=0 name=root geo="" name=opendir path=/eos/kolkataalice/proc/conversion

[root@eos ~]# tail -f /var/log/eos/mgm/error.log
181115 11:44:04 time=1542262444.723423 func=CallManager level=ERROR logid=a122f01c-e89d-11e8-8f49-b083fed762eb unit=fst@eos03.tier2-kol.res.in:1095 tid=00007fc756bff700 source=XrdFstOfs:857 tident= sec= uid=0 gid=0 name= geo="" error=URL is not valid: root:////dummy?xrd.wantprot=sss
181115 11:44:04 time=1542262444.723502 func=Remover level=ERROR logid=static… unit=fst@eos03.tier2-kol.res.in:1095 tid=00007fc756bff700 source=Remover:111 tident= sec=(null) uid=99 gid=99 name=- geo="" manager returned errno=22
181115 11:44:04 time=1542262444.659851 func=Cleaner level=ERROR logid=static… unit=fst@eos02.tier2-kol.res.in:1095 tid=00007f62103ff700 source=Cleaner:66 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“don’t know the manager name”
181115 11:44:04 time=1542262444.752341 func=CallManager level=ERROR logid=a128c6c2-e89d-11e8-a32b-b083fed75e80 unit=fst@eos02.tier2-kol.res.in:1095 tid=00007f62139fd700 source=XrdFstOfs:857 tident= sec= uid=0 gid=0 name= geo="" error=URL is not valid: root:////dummy?xrd.wantprot=sss
181115 11:44:04 time=1542262444.752407 func=Remover level=ERROR logid=static… unit=fst@eos02.tier2-kol.res.in:1095 tid=00007f62139fd700 source=Remover:111 tident= sec=(null) uid=99 gid=99 name=- geo="" manager returned errno=22
181115 11:44:04 time=1542262444.723259 func=Cleaner level=ERROR logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007f3b2c3ff700 source=Cleaner:66 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“don’t know the manager name”
181115 11:44:04 time=1542262444.814451 func=CallManager level=ERROR logid=a1320c82-e89d-11e8-92b1-b083fece9f5f unit=fst@eos01.tier2-kol.res.in:1095 tid=00007f3b2f5fd700 source=XrdFstOfs:857 tident= sec= uid=0 gid=0 name= geo="" error=URL is not valid: root:////dummy?xrd.wantprot=sss
181115 11:44:04 time=1542262444.814517 func=Remover level=ERROR logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007f3b2f5fd700 source=Remover:111 tident= sec=(null) uid=99 gid=99 name=- geo="" manager returned errno=22
XrdMqClient::Reopening of new alias failed …
XrdMqClient::Reopening of new alias failed …

Log file at FST also shows the same and fst service stopped after few second.

181115 12:24:12 time=1542264852.658502 func=Communicator level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8065fd700 source=Comunicator:106 tident= sec=(null) uid=99 gid=99 name=- geo="" FST shared object notification subject is /eos/eos01.tier2-kol.res.in:1095/fst/gw/txqueue/txq
181115 12:24:12 time=1542264852.658976 func=Communicator level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8065fd700 source=Comunicator:106 tident= sec=(null) uid=99 gid=99 name=- geo="" FST shared object notification subject is /config/eoskolkataalice/node/eos01.tier2-kol.res.in:1095
181115 12:24:12 time=1542264852.659017 func=Configure level=NOTE logid=3f6bb9a2-e8a3-11e8-97d7-b083fece9f5f unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb80f4c0740 source=XrdFstOfs:735 tident= sec= uid=0 gid=0 name= geo="" FST_HOST=eos01.tier2-kol.res.in FST_PORT=1095 FST_HTTP_PORT=8001 VERSION=4.3.12 RELEASE=1 KEYTABADLER=d5032496
181115 12:24:12 time=1542264852.659098 func=Communicator level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8065fd700 source=Comunicator:133 tident= sec=(null) uid=99 gid=99 name=- geo="" storing config queue name </config/eoskolkataalice/node/eos01.tier2-kol.res.in:1095>
181115 12:24:12 time=1542264852.659185 func=Communicator level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8065fd700 source=Comunicator:106 tident= sec=(null) uid=99 gid=99 name=- geo="" FST shared object notification subject is /eos01.tier2-kol.res.in:1095
181115 12:24:12 time=1542264852.659221 func=Communicator level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8065fd700 source=Comunicator:136 tident= sec=(null) uid=99 gid=99 name=- geo="" no action on creation of subject <
/eos01.tier2-kol.res.in:1095> - we are </eos/eos01.tier2-kol.res.in:1095/fst>
181115 12:24:12 time=1542264852.659247 func=Communicator level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8065fd700 source=Comunicator:106 tident= sec=(null) uid=99 gid=99 name=- geo="" FST shared object notification subject is /eos01.tier2-kol.res.in:1095/fst/gw/txqueue/txq
181115 12:24:12 time=1542264852.659273 func=Communicator level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8065fd700 source=Comunicator:106 tident= sec=(null) uid=99 gid=99 name=- geo="" FST shared object notification subject is /eos/eos01.tier2-kol.res.in:1095/fst/

Config warning: asynchronous I/O has been disabled!
Config warning: sendfile I/O has been disabled!
Config warning: ‘xrootd.prepare logdir’ not specified; prepare tracking disabled.
------ xrootd protocol initialization completed.
------ xrootd fst@eos01.tier2-kol.res.in:1095 initialization completed.
181115 12:24:12 time=1542264852.671938 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb802afe700 source=Config:40 tident= sec=(null) uid=99 gid=99 name=- geo="" Waiting for config queue in Publish …
181115 12:24:13 time=1542264853.659472 func=Run level=NOTE logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb803dfd700 source=HttpServer:135 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“starting http server” mode=“epoll” threads=32
181115 12:24:13 time=1542264853.661319 func=Run level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb803dfd700 source=HttpServer:179 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“start of micro httpd succeeded [port=8001]”
181115 12:24:17 time=1542264857.646131 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8049fd700 source=Config:40 tident= sec=(null) uid=99 gid=99 name=- geo="" Waiting for config queue in Remover …
181115 12:24:17 time=1542264857.649804 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb801fff700 source=Config:40 tident= sec=(null) uid=99 gid=99 name=- geo="" Waiting for config queue in …
181115 12:24:17 time=1542264857.649893 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8029fd700 source=Config:40 tident= sec=(null) uid=99 gid=99 name=- geo="" Waiting for config queue in Balancer …
181115 12:24:17 time=1542264857.653034 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb800bff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 12:24:17 time=1542264857.653902 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb801efe700 source=Config:40 tident= sec=(null) uid=99 gid=99 name=- geo="" Waiting for config queue in Cleaner …
181115 12:24:17 time=1542264857.653974 func=Cleaner level=NOTE logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb801efe700 source=Cleaner:41 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“cleaning transactions”
181115 12:24:17 time=1542264857.654010 func=Cleaner level=ERROR logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb801efe700 source=Cleaner:66 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“don’t know the manager name”
181115 12:24:17 time=1542264857.746383 func=CallManager level=ERROR logid=3f6bb9a2-e8a3-11e8-97d7-b083fece9f5f unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8049fd700 source=XrdFstOfs:857 tident= sec= uid=0 gid=0 name= geo="" error=URL is not valid: root:////dummy?xrd.wantprot=sss
181115 12:24:17 time=1542264857.746459 func=Remover level=ERROR logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb8049fd700 source=Remover:111 tident= sec=(null) uid=99 gid=99 name=- geo="" manager returned errno=22
181115 12:24:22 time=1542264862.653217 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb800bff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
:
:
181115 12:25:47 time=1542264947.655970 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb800bff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 12:25:52 time=1542264952.656160 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb800bff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 12:25:52 time=1542264952.656219 func=MgmSyncer level=ALERT logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fb800bff700 source=MgmSyncer:66 tident= sec=(null) uid=99 gid=99 name=- geo="" didn’t receive manager name, aborting
@@@@@@ 00:00:00 op=shutdown msg="shutdown timedout after 0 seconds, signal=1
@@@@@@ 00:00:00 op=shutdown status=forced-complete

Kindly suggest accordingly.
Vikas

Hi Vikas,
what is the xrootd version you are running?

Can you paste the end of the logfile of /var/log/eos/mq/xrdlog.mq ?

The origin is that the FST cannot talk to the MQ service.

Thanks Andreas.

Dear Andreas,

Xrootd Versions are

[root@eos ~]# xrootd -v
v4.8.5
[root@eos ~]# cexec “xrootd -v”
************************* eos_kolkata_cluster *************************
--------- eos01.tier2-kol.res.in---------
v4.8.5
--------- eos02.tier2-kol.res.in---------
v4.8.5
--------- eos03.tier2-kol.res.in---------
v4.8.5
[root@eos ~]#

[root@eos ~]# tail -f /var/log/eos/mq/xrdlog.mq
181115 15:42:40 5837 daemon.5879:7@localhost MqOfs_close: Disconnected Queue: /eos/eos.tier2-kol.res.in/mgm-fsck-0
181115 15:47:12 5840 XrootdXeq: daemon.7428:21@eos03 pub IPv4 login as daemon
181115 15:47:12 5840 daemon.7428:21@eos03 MqOfs_open: Connecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:47:12 5840 daemon.7428:21@eos03 MqOfs_open: Connected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:47:12 5841 XrootdXeq: daemon.7432:22@eos02 pub IPv4 login as daemon
181115 15:47:12 5841 daemon.7432:22@eos02 MqOfs_open: Connecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:47:12 5841 daemon.7432:22@eos02 MqOfs_open: Connected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:47:12 7370 XrootdXeq: daemon.7831:23@eos01 pub IPv4 login as daemon
181115 15:47:12 7370 daemon.7831:23@eos01 MqOfs_open: Connecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:47:12 7370 daemon.7831:23@eos01 MqOfs_open: Connected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:48:23 9634 XrootdXeq: daemon.7428:24@eos03 pub IPv4 login as daemon
181115 15:48:23 9635 XrootdXeq: daemon.7428:21@eos03 disc 0:01:11 (ended by daemon.7428:24@eos03)
181115 15:48:23 9635 daemon.7428:21@eos03 MqOfs_close: Disconnecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:48:23 9635 daemon.7428:21@eos03 MqOfs_close: Disconnected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:48:23 5840 XrootdXeq: daemon.7432:25@eos02 pub IPv4 login as daemon
181115 15:48:23 5841 XrootdXeq: daemon.7432:22@eos02 disc 0:01:11 (ended by daemon.7432:25@eos02)
181115 15:48:23 5841 daemon.7432:22@eos02 MqOfs_close: Disconnecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:48:23 9634 daemon.7428:24@eos03 MqOfs_open: Connecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:48:23 9634 daemon.7428:24@eos03 MqOfs_open: Connected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:48:23 5841 daemon.7432:22@eos02 MqOfs_close: Disconnected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:48:23 5840 daemon.7432:25@eos02 MqOfs_open: Connecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:48:23 5840 daemon.7432:25@eos02 MqOfs_open: Connected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:48:23 7370 XrootdXeq: daemon.7831:26@eos01 pub IPv4 login as daemon
181115 15:48:23 10019 XrootdXeq: daemon.7831:23@eos01 disc 0:01:11 (ended by daemon.7831:26@eos01)
181115 15:48:23 10019 daemon.7831:23@eos01 MqOfs_close: Disconnecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:48:23 10019 daemon.7831:23@eos01 MqOfs_close: Disconnected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:48:23 7370 daemon.7831:26@eos01 MqOfs_open: Connecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:48:23 7370 daemon.7831:26@eos01 MqOfs_open: Connected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:49:04 5840 daemon.7432:25@eos02 MqOfs_close: Disconnecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:49:04 5840 daemon.7432:25@eos02 MqOfs_close: Disconnected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:49:04 5840 daemon.7432:25@eos02 MqOfs_close: Disconnecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:49:04 5840 daemon.7432:25@eos02 MqOfs_close: Disconnected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:49:04 9634 daemon.7428:24@eos03 MqOfs_close: Disconnecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:49:04 9634 daemon.7428:24@eos03 MqOfs_close: Disconnected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:49:04 9634 daemon.7428:24@eos03 MqOfs_close: Disconnecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:49:04 7370 daemon.7831:26@eos01 MqOfs_close: Disconnecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:49:04 9634 daemon.7428:24@eos03 MqOfs_close: Disconnected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:49:04 7370 daemon.7831:26@eos01 MqOfs_close: Disconnected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:49:04 7370 daemon.7831:26@eos01 MqOfs_close: Disconnecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:49:04 7370 daemon.7831:26@eos01 MqOfs_close: Disconnected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:49:04 9634 XrootdXeq: daemon.7428:24@eos03 disc 0:00:41
181115 15:49:04 5840 XrootdXeq: daemon.7432:25@eos02 disc 0:00:41
181115 15:49:04 7370 XrootdXeq: daemon.7831:26@eos01 disc 0:00:41
181115 15:49:10 10018 XrootdXeq: daemon.7706:21@eos03 pub IPv4 login as daemon
181115 15:49:10 10018 daemon.7706:21@eos03 MqOfs_open: Connecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:49:10 10018 daemon.7706:21@eos03 MqOfs_open: Connected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:49:10 9635 XrootdXeq: daemon.7710:22@eos02 pub IPv4 login as daemon
181115 15:49:10 9635 daemon.7710:22@eos02 MqOfs_open: Connecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:49:10 9635 daemon.7710:22@eos02 MqOfs_open: Connected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:49:10 5841 XrootdXeq: daemon.8109:23@eos01 pub IPv4 login as daemon
181115 15:49:10 5841 daemon.8109:23@eos01 MqOfs_open: Connecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:49:10 5841 daemon.8109:23@eos01 MqOfs_open: Connected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:50:21 10019 XrootdXeq: daemon.7710:24@eos02 pub IPv4 login as daemon
181115 15:50:21 10020 XrootdXeq: daemon.7710:22@eos02 disc 0:01:11 (ended by daemon.7710:24@eos02)
181115 15:50:21 10020 daemon.7710:22@eos02 MqOfs_close: Disconnecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:50:21 10020 daemon.7710:22@eos02 MqOfs_close: Disconnected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:50:21 10019 daemon.7710:24@eos02 MqOfs_open: Connecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:50:21 10019 daemon.7710:24@eos02 MqOfs_open: Connected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:50:21 5841 XrootdXeq: daemon.7706:25@eos03 pub IPv4 login as daemon
181115 15:50:21 5840 XrootdXeq: daemon.7706:21@eos03 disc 0:01:11 (ended by daemon.7706:25@eos03)
181115 15:50:21 5840 daemon.7706:21@eos03 MqOfs_close: Disconnecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:50:21 7370 XrootdXeq: daemon.8109:26@eos01 pub IPv4 login as daemon
181115 15:50:21 5840 daemon.7706:21@eos03 MqOfs_close: Disconnected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:50:21 10018 XrootdXeq: daemon.8109:23@eos01 disc 0:01:11 (ended by daemon.8109:26@eos01)
181115 15:50:21 10018 daemon.8109:23@eos01 MqOfs_close: Disconnecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:50:21 10018 daemon.8109:23@eos01 MqOfs_close: Disconnected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:50:21 5841 daemon.7706:25@eos03 MqOfs_open: Connecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:50:21 5841 daemon.7706:25@eos03 MqOfs_open: Connected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:50:21 7370 daemon.8109:26@eos01 MqOfs_open: Connecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:50:21 7370 daemon.8109:26@eos01 MqOfs_open: Connected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:51:05 5841 daemon.7706:25@eos03 MqOfs_close: Disconnecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:51:05 5841 daemon.7706:25@eos03 MqOfs_close: Disconnected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:51:05 5841 daemon.7706:25@eos03 MqOfs_close: Disconnecting Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:51:05 5841 daemon.7706:25@eos03 MqOfs_close: Disconnected Queue: /eos/eos03.tier2-kol.res.in:1095/fst
181115 15:51:05 10019 daemon.7710:24@eos02 MqOfs_close: Disconnecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:51:05 10019 daemon.7710:24@eos02 MqOfs_close: Disconnected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:51:05 10019 daemon.7710:24@eos02 MqOfs_close: Disconnecting Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:51:05 10019 daemon.7710:24@eos02 MqOfs_close: Disconnected Queue: /eos/eos02.tier2-kol.res.in:1095/fst
181115 15:51:05 7370 daemon.8109:26@eos01 MqOfs_close: Disconnecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:51:05 7370 daemon.8109:26@eos01 MqOfs_close: Disconnected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:51:05 7370 daemon.8109:26@eos01 MqOfs_close: Disconnecting Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:51:05 7370 daemon.8109:26@eos01 MqOfs_close: Disconnected Queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 15:51:05 5841 XrootdXeq: daemon.7706:25@eos03 disc 0:00:44
181115 15:51:05 10019 XrootdXeq: daemon.7710:24@eos02 disc 0:00:44
181115 15:51:05 7370 XrootdXeq: daemon.8109:26@eos01 disc 0:00:44

Suggest accordingly.
Vikas

Ok,
you have to go back to xrootd 4.8.4 … 4.8.5 has a serious bug and is not usable for EOS.
You can use XRootd from the EPEL respository and the xrootd-stable one.

Cheers Andreas.

Dear Andreas,

We tried as you suggested but still the same error. (FST could not communicate with MQ).
We downgraded xrootd from all 4 servers (1 master and 3 FSTs), due to this all eos related packages also reinstalled.

[root@eos ~]# xrootd -v
v4.8.4
[root@eos ~]# cexec “xrootd -v”
************************* eos_kolkata_cluster *************************
--------- eos01.tier2-kol.res.in---------
v4.8.4
--------- eos02.tier2-kol.res.in---------
v4.8.4
--------- eos03.tier2-kol.res.in---------
v4.8.4
[root@eos ~]#

Output of xrdlog.mq is same as earlier.
181115 21:59:39 14968 Starting on Linux 2.6.32-754.6.3.el6.x86_64
Copr. 2004-2012 Stanford University, xrd version v4.8.4
++++++ xrootd mq@eos.tier2-kol.res.in initialization started.
Config using configuration file /etc/xrd.cf.mq
=====> xrd.sched mint 16 maxt 1024 idle 128
=====> xrd.port 1097
=====> xrd.network keepalive
=====> xrd.timeout idle 120
Config maximum number of connections restricted to 65000
Copr. 2012 Stanford University, xrootd protocol 3.1.0 version v4.8.4
++++++ xrootd protocol initialization started.
=====> xrootd.fslib libXrdMqOfs.so
=====> all.export /eos/ nolock
=====> xrootd.async off nosf
=====> xrootd.seclib libXrdSec.so
Config exporting /eos/
Plugin loaded
++++++ Authentication system initialization started.
Plugin loaded
=====> sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
=====> sec.protbind * only sss
Config 2 authentication directives processed in /etc/xrd.cf.mq
------ Authentication system initialization completed.
++++++ Protection system initialization started.
Config warning: Security level is set to none; request protection disabled!
Config Local protection level: none
Config Remote protection level: none
------ Protection system initialization completed.
Config Routing for eos.tier2-kol.res.in: local pub4 prv4
Config Route all4: eos.tier2-kol.res.in Dest=[::144.16.112.17]:1097
Plugin No such file or directory loading fslib libXrdMqOfs-4.so
Config Falling back to using libXrdMqOfs.so
Plugin loaded
++++++ © 2018 CERN/IT-DSS 4.4.10
=====> mq.hostname: eos.tier2-kol.res.in
=====> mq.hostpref: eos
=====> mq.managerid: eos.tier2-kol.res.in:1097
=====> mq.queue: /eos/
=====> mq.brokerid: root://eos.tier2-kol.res.in:1097//eos/
Config warning: asynchronous I/O has been disabled!
Config warning: sendfile I/O has been disabled!
Config warning: ‘xrootd.prepare logdir’ not specified; prepare tracking disabled.
------ xrootd protocol initialization completed.
------ xrootd mq@eos.tier2-kol.res.in:1097 initialization completed.
181115 21:59:54 14972 XrootdXeq: daemon.15014:7@localhost pvt IPv4 login as daemon
181115 21:59:54 time=1542299394.108830 func=open level=INFO logid=aebc8228-e8f3-11e8-80f8-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f820fb700 source=XrdMqOfs:75 tident= sec= uid=0 gid=0 name= geo="" connecting queue: /eos/eos.tier2-kol.res.in/mgm
181115 21:59:54 time=1542299394.108907 func=open level=INFO logid=aebc8228-e8f3-11e8-80f8-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f820fb700 source=XrdMqOfs:118 tident= sec= uid=0 gid=0 name= geo="" connected queue: /eos/eos.tier2-kol.res.in/mgm
181115 21:59:54 14973 XrootdXeq: daemon.15014:18@localhost pvt IPv4 login as daemon
181115 21:59:54 time=1542299394.193836 func=open level=INFO logid=aec9802c-e8f3-11e8-9c16-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f81ffa700 source=XrdMqOfs:75 tident= sec= uid=0 gid=0 name= geo="" connecting queue: /eos/eos.tier2-kol.res.in/report
181115 21:59:54 time=1542299394.193877 func=open level=INFO logid=aec9802c-e8f3-11e8-9c16-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f81ffa700 source=XrdMqOfs:118 tident= sec= uid=0 gid=0 name= geo="" connected queue: /eos/eos.tier2-kol.res.in/report
181115 21:59:54 14974 XrootdXeq: root.15833:20@localhost pvt IPv4 login as daemon
181115 21:59:54 time=1542299394.274060 func=open level=INFO logid=aed5bdba-e8f3-11e8-b25a-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f81ef9700 source=XrdMqOfs:75 tident= sec= uid=0 gid=0 name= geo="" connecting queue: /eos/:15833:1/errorreport
181115 21:59:54 time=1542299394.274108 func=open level=INFO logid=aed5bdba-e8f3-11e8-b25a-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f81ef9700 source=XrdMqOfs:118 tident= sec= uid=0 gid=0 name= geo="" connected queue: /eos/:15833:1/errorreport
181115 22:00:34 time=1542299434.197067 func=open level=INFO logid=c6a18226-e8f3-11e8-80f8-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f820fb700 source=XrdMqOfs:75 tident= sec= uid=0 gid=0 name= geo="" connecting queue: /eos/eos.tier2-kol.res.in/mgm-fsck-0
181115 22:00:34 time=1542299434.197132 func=open level=INFO logid=c6a18226-e8f3-11e8-80f8-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f820fb700 source=XrdMqOfs:118 tident= sec= uid=0 gid=0 name= geo="" connected queue: /eos/eos.tier2-kol.res.in/mgm-fsck-0
181115 22:00:34 14972 MqOfs_FSctl: Unable to submit message - no listener on requested queue: /eos//fst; Invalid argument
181115 22:00:34 14972 daemon.15014:7@localhost MqOfs_FSctl: no listener on requested queue:
181115 22:00:34 14972 daemon.15014:7@localhost MqOfs_FSctl: /eos/
/fst
181115 22:00:44 time=1542299444.199021 func=close level=INFO logid=c6a18226-e8f3-11e8-80f8-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f820fb700 source=XrdMqOfs:237 tident= sec= uid=0 gid=0 name= geo="" disconnecting queue: /eos/eos.tier2-kol.res.in/mgm-fsck-0
181115 22:00:44 time=1542299444.199168 func=close level=INFO logid=c6a18226-e8f3-11e8-80f8-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f820fb700 source=XrdMqOfs:274 tident= sec= uid=0 gid=0 name= geo="" disconnected queue: /eos/eos.tier2-kol.res.in/mgm-fsck-0
181115 22:00:44 time=1542299444.199202 func=close level=INFO logid=c6a18226-e8f3-11e8-80f8-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f820fb700 source=XrdMqOfs:237 tident= sec= uid=0 gid=0 name= geo="" disconnecting queue: /eos/eos.tier2-kol.res.in/mgm-fsck-0
181115 22:00:44 time=1542299444.199265 func=close level=INFO logid=c6a18226-e8f3-11e8-80f8-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f820fb700 source=XrdMqOfs:274 tident= sec= uid=0 gid=0 name= geo="" disconnected queue: /eos/eos.tier2-kol.res.in/mgm-fsck-0
181115 22:01:04 14975 XrootdXeq: daemon.5546:21@eos01 pub IPv4 login as daemon
181115 22:01:04 time=1542299464.580929 func=open level=INFO logid=d8bdb79a-e8f3-11e8-82de-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f81df8700 source=XrdMqOfs:75 tident= sec= uid=0 gid=0 name= geo="" connecting queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:01:04 time=1542299464.580988 func=open level=INFO logid=d8bdb79a-e8f3-11e8-82de-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f81df8700 source=XrdMqOfs:118 tident= sec= uid=0 gid=0 name= geo="" connected queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:02:11 16157 XrootdXeq: daemon.5546:23@eos01 pub IPv4 login as daemon
181115 22:02:11 14975 XrootdXeq: daemon.5546:21@eos01 disc 0:01:07 (ended by daemon.5546:23@eos01)
181115 22:02:11 time=1542299531.668218 func=close level=INFO logid=d8bdb79a-e8f3-11e8-82de-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f81df8700 source=XrdMqOfs:237 tident= sec= uid=0 gid=0 name= geo="" disconnecting queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:02:11 time=1542299531.668386 func=close level=INFO logid=d8bdb79a-e8f3-11e8-82de-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f81df8700 source=XrdMqOfs:274 tident= sec= uid=0 gid=0 name= geo="" disconnected queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:02:11 time=1542299531.668939 func=open level=INFO logid=00ba8958-e8f4-11e8-a0f0-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f7a93f700 source=XrdMqOfs:75 tident= sec= uid=0 gid=0 name= geo="" connecting queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:02:11 time=1542299531.669026 func=open level=INFO logid=00ba8958-e8f4-11e8-a0f0-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f7a93f700 source=XrdMqOfs:118 tident= sec= uid=0 gid=0 name= geo="" connected queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:02:59 time=1542299579.617215 func=close level=INFO logid=00ba8958-e8f4-11e8-a0f0-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f7a93f700 source=XrdMqOfs:237 tident= sec= uid=0 gid=0 name= geo="" disconnecting queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:02:59 time=1542299579.617375 func=close level=INFO logid=00ba8958-e8f4-11e8-a0f0-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f7a93f700 source=XrdMqOfs:274 tident= sec= uid=0 gid=0 name= geo="" disconnected queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:02:59 time=1542299579.617404 func=close level=INFO logid=00ba8958-e8f4-11e8-a0f0-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f7a93f700 source=XrdMqOfs:237 tident= sec= uid=0 gid=0 name= geo="" disconnecting queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:02:59 time=1542299579.617466 func=close level=INFO logid=00ba8958-e8f4-11e8-a0f0-80c16eaacee0 unit=mq@eos.tier2-kol.res.in:1097 tid=00007f4f7a93f700 source=XrdMqOfs:274 tident= sec= uid=0 gid=0 name= geo="" disconnected queue: /eos/eos01.tier2-kol.res.in:1095/fst
181115 22:02:59 16157 XrootdXeq: daemon.5546:23@eos01 disc 0:00:48
[root@eos ~]#

Also output of xrdlog.fst at fst is same.

[root@eos01 ~]# tail -f /var/log/eos/fst/xrdlog.fst
#5 Object "/lib64/libpthread.so.0, at 0x332f807aa0, in
#4 Object "/usr/lib64/libXrdUtils.so.2, at 0x7fc4c8ed253e, in XrdSysThread_Xeq
#3 Object "/usr/lib64/libXrdEosFst.so, at 0x7fc4c6853138, in eos::fst::Storage::StartFsPublisher(void*)
#2 Object "/usr/lib64/libXrdEosFst.so, at 0x7fc4c684d56d, in eos::fst::Storage::Publish()
#1 Object "/usr/lib64/libeosCommon.so.4, at 0x7fc4c4a1a8ed, in eos::common::RWMutexReadLock::Release()
#0 Object "/usr/lib64/libeosCommon.so.4, at 0x7fc4c4a1cf2b, in eos::common::getStacktrace()

181115 22:02:11 time=1542299531.670657 func=Publish level=WARN logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4bb7ff700 source=Publish:478 tident= sec=(null) uid=99 gid=99 name=- geo="" Publisher cycle exceeded 13198 millisecons - took 47987 milliseconds
181115 22:02:14 time=1542299534.606988 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4b9fff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 22:02:19 time=1542299539.607158 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4b9fff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 22:02:24 time=1542299544.607354 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4b9fff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 22:02:29 time=1542299549.607533 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4b9fff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 22:02:34 time=1542299554.607708 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4b9fff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 22:02:39 time=1542299559.607812 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4b9fff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 22:02:44 time=1542299564.608022 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4b9fff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 22:02:49 time=1542299569.608207 func=MgmSyncer level=INFO logid=FstOfsStorage unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4b9fff700 source=MgmSyncer:63 tident= sec= uid=0 gid=0 name= geo="" msg=“waiting to know manager”
181115 22:02:49 time=1542299569.608281 func=MgmSyncer level=ALERT logid=static… unit=fst@eos01.tier2-kol.res.in:1095 tid=00007fc4b9fff700 source=MgmSyncer:66 tident= sec=(null) uid=99 gid=99 name=- geo="" didn’t receive manager name, aborting
@@@@@@ 00:00:00 op=shutdown msg="shutdown timedout after 0 seconds, signal=1
@@@@@@ 00:00:00 op=shutdown status=forced-complete

Kindly suggest accordingly.

Vikas

Dear Andreas,

Any further clue for this thread.
We are planning to go back to Aquamarine so that we can access the storage and shift the same.

Kindly suggest.
Regards
Vikas

Sorry Vikas, I forgot yesterday …

Can you just start the MQ and them the MGM and paste me the output of the xrdlog.mq file.

The problem is, that the MGM does not connect to the MQ daemon. The FST wait for the MGM to send its name, but it looks like, it is not connected.

Dear Andreas,

Apology, I loosed the patience and installed one more MGM with SL6.10 and reconfigured with AQUAMARINE. Earlier EOS MGM with CITRINE is offline, tomorrow I will do as you suggested.
Still all the FSTs have SL6.10 with CITRINE as below

[root@eos ~]# eos -v
EOS 0.3.268 (CERN)

Written by CERN-IT-DSS (Andreas-Joachim Peters, Lukasz Janyst & Elvin Sindrilaru)
[root@eos ~]# cexec "eos -v "
************************* eos_kolkata_cluster *************************
--------- eos01.tier2-kol.res.in---------
EOS 4.4.10 (CERN)

Written by CERN-IT-DSS (Andreas-Joachim Peters, Lukasz Janyst & Elvin Sindrilaru)
--------- eos02.tier2-kol.res.in---------
EOS 4.4.10 (CERN)

Written by CERN-IT-DSS (Andreas-Joachim Peters, Lukasz Janyst & Elvin Sindrilaru)
--------- eos03.tier2-kol.res.in---------
EOS 4.4.10 (CERN)

Written by CERN-IT-DSS (Andreas-Joachim Peters, Lukasz Janyst & Elvin Sindrilaru)
[root@eos ~]#

Here things looks good and earlier problem of contacting MQ and FST resolved. Now we can view all the nodes and space in the eos.
[root@eos ~]# eos -b node ls
#-------------------------------------------------------------------------------------------------------------------------------------# type # hostport # geotag # status # status # txgw #gw-queued # gw-ntx #gw-rate # heartbeatdelta #nofs
#----------------------------------------------------------------------------------------------------------------------------------------------
nodesview eos01.tier2-kol.res.in:1095 geotagdefault online on off 0 10 120 1 12
nodesview eos02.tier2-kol.res.in:1095 geotagdefault online on off 0 10 120 1 12
nodesview eos03.tier2-kol.res.in:1095 geotagdefault online on off 0 10 120 1 12
[root@eos ~]# eos -b space ls
#------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------# type # name # groupsize # groupmod #N(fs) #N(fs-rw) #sum(usedbytes) #sum(capacity) #capacity(rw) #nom.capacity #quota #balancing # threshold # converter # ntx # active #intergroup
#------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
spaceview default 0 24 36 0 127.36 T 141.74 T 0 0 off on 20 off 2 0 off
[root@eos ~]#

But eosd service stuck after few seconds therefore could not access /eos fuse partition. We tried to forcefully unmount the /eos and restarted the eosd but problem persist.
Below is error related to eosd and xrootd.

[root@eos ~]# service eosd restart
Stopping eosd for instance: main
[ OK ]

Starting eosd for instance: mainmkdir: cannot create directory /eos/': File exists chmod: cannot access/eos/’: Transport endpoint is not connected

EOS_FUSE_PING_TIMEOUT : 15
EOS_FUSE_DEBUG : 0
EOS_FUSE_LOWLEVEL_DEBUG : 0
EOS_FUSE_NOACCESS : 1
EOS_FUSE_SYNC : 0
EOS_FUSE_KERNELCACHE : 1
EOS_FUSE_DIRECTIO : 0
EOS_FUSE_CACHE : 1
EOS_FUSE_CACHE_SIZE : 67108864
EOS_FUSE_CACHE_PAGE_SIZE : 262144
EOS_FUSE_BIGWRITES : 1
EOS_FUSE_EXEC : 0
EOS_FUSE_NO_MT : 0
EOS_FUSE_SSS_KEYTAB :
EOS_FUSE_USER_KRB5CC : 0
EOS_FUSE_USER_GSIPROXY : 0
EOS_FUSE_USER_KRB5FIRST : 0
EOS_FUSE_PIDMAP : 0
EOS_FUSE_RMLVL_PROTECT : 1
EOS_FUSE_RDAHEAD : 0
EOS_FUSE_RDAHEAD_WINDOW : 131072
EOS_FUSE_LAZYOPENRO : 0
EOS_FUSE_LAZYOPENRW : 1
EOS_FUSE_SHOW_SPECIAL_FILES : 0
EOS_FUSE_SHOW_EOS_ATTRIBUTES : 0
EOS_FUSE_INLINE_REPAIR : 1
EOS_FUSE_MAX_INLINE_REPAIR_SIZE : 268435456
EOS_FUSE_ATTR_CACHE_TIME : 2
EOS_FUSE_ENTRY_CACHE_TIME : 2
EOS_FUSE_NEG_ENTRY_CACHE_TIME : 30
EOS_FUSE_CREATOR_CAP_LIFETIME : 30
EOS_FUSE_FILE_WB_CACHE_SIZE : 67108864
EOS_FUSE_MAX_WB_INMEMORY_SIZE : 536870912
EOS_FUSE_XRDBUGNULLRESPONSE_RETRYCOUNT : 3
EOS_FUSE_XRDBUGNULLRESPONSE_RETRYSLEEPMS : 1
EOS_FUSE_LOG_PREFIX : main
EOS_FUSE_MOUNTDIR : /eos/
EOS_FUSE_REMOTEDIR : /eos/
[root@eos ~]#

[root@eos ~]# xrootd -v
181116 21:49:58 7415 Scalla is starting. . .
Copr. 2004-2012 Stanford University, xrd version v20170724-0e6b1f5
++++++ xrootd anon@eos.tier2-kol.res.in initialization started.
Config maximum number of connections restricted to 4096
181116 21:49:58 7415 XrdOpen: Unable to bind socket to port 1094; address already in use
------ xrootd anon@eos.tier2-kol.res.in:-1 initialization failed.
[root@eos ~]# rpm -qa | grep xrootd*
xrootd-3.3.6-6.CERN.slc6.x86_64
xrootd-client-3.3.6-6.CERN.slc6.x86_64
xrootd-client-libs-3.3.6-6.CERN.slc6.x86_64
xrootd-alicetokenacc-1.2.4-1.x86_64
xrootd-libs-3.3.6-6.CERN.slc6.x86_64
xrootd-server-libs-3.3.6-6.CERN.slc6.x86_64
[root@eos ~]#

Kindly suggest accordingly.
Vikas

Is this still true, or had the bug been fixed in the meantime ?
I have successfully ran some operations on eos with xrootd 4.8.5 from xrootd-stable repo on MGM, FST and fuse client, but I was wondering if this is safe for production…
If not, where should it be avoided ? MGM, FST, clients, everywhere ? And what would be the risk if we used it ?
We have for sure some clients using xrootd 4.8.5 on production (servers are 4.8.4) and they don’t seem to give any issue.