Dear Experts,
We are running EOS 4.8.25 and have three MGMs for high availability and/or load-balancing. Before setting an alias for these MGMs, we have tried to read and write a file from a client by pointing different MGMs. Writing a file works well through secondary MGMs (which is actually quite not assured): it looks the secondaries redirect the writing request properly to the master MGM, then it takes care of the rest. However, surprisingly, reading a file through secondaries are not working at all because the secondary MGMs recognize corresponding FS as offline.
Master MQ is running on the primary MGM and all FSTs are communicating with it. As far as I understood, it is because the secondary MGMs cannot recognize any FSs as online. Thus any read requests through the secondaries are failed because the secondary MGMs claim that a certain FS (with a certain fsid) are not responsible.
Here is the snippet of logs on one of secondary MGMs when it get a read request:
201117 06:02:40 068 XrootdXeq: root.1122:118@jbod-mgmt-03 pub IP46 login as daemon
201117 06:02:40 060 XrootdXeq: root.1131:335@jbod-mgmt-03 pub IP46 login as daemon
201117 06:02:40 time=1605592960.207347 func=IdMap level=INFO logid=static.............................. unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=Mapping:993 tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=sss sec.name="daemon" sec.host="jbod-mgmt-03.sdfarm.kr" sec.vorg="" sec.grps="daemon" sec.role="" sec.info="" sec.app="eoscp" sec.tident="root.1131:335@jbod-mgmt-03" vid.uid=2 vid.gid=2
201117 06:02:40 time=1605592960.207477 func=open level=INFO logid=7ff99ffa-289a-11eb-8a17-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=XrdMgmOfsFile:484 tident=root.1131:335@jbod-mgmt-03 sec=sss uid=2 gid=2 name=daemon geo="" op=read path=/eos/gsdc/testarea/rain16/file1g-jbod-mgmt-03.new.7 info=eos.app=eoscp&mgm.pcmd=open&eos.cli.access=pio
201117 06:02:40 time=1605592960.225689 func=open level=INFO logid=7ff99ffa-289a-11eb-8a17-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=XrdMgmOfsFile:1009 tident=root.1131:335@jbod-mgmt-03 sec=sss uid=2 gid=2 name=daemon geo="" acl=0 r=0 w=0 wo=0 egroup=0 shared=0 mutable=1
201117 06:02:40 time=1605592960.225915 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1375
201117 06:02:40 time=1605592960.225977 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1459
201117 06:02:40 time=1605592960.226013 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=787
201117 06:02:40 time=1605592960.226047 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1039
201117 06:02:40 time=1605592960.226078 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=31
201117 06:02:40 time=1605592960.226108 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1207
201117 06:02:40 time=1605592960.226144 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=871
201117 06:02:40 time=1605592960.226177 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=451
201117 06:02:40 time=1605592960.226220 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=955
201117 06:02:40 time=1605592960.226249 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1123
201117 06:02:40 time=1605592960.226277 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=367
201117 06:02:40 time=1605592960.226304 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1291
201117 06:02:40 time=1605592960.226365 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=199
201117 06:02:40 time=1605592960.226402 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=535
201117 06:02:40 time=1605592960.226430 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=283
201117 06:02:40 time=1605592960.226458 func=accessHeadReplicaMultipleGroup level=WARN logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857 tident=<service> sec= uid=0 gid=0 name= geo="" msg="file system not readable" fsid=619
On the secondary MGM, the output of eos fs ls 619 shows:
sh-4.2# eos fs ls 619
ββββββββββββββββββββββββββ¬βββββ¬βββββββ¬βββββββββββββββββββββββββββββββββ¬βββββββββββββββββ¬βββββββββββββββββ¬βββββββββββββ¬βββββββββββββββ¬βββββββββββββ¬βββββββββ¬βββββββββββββββββ
βhost βportβ idβ pathβ schedgroupβ geotagβ bootβ configstatusβ drainβ activeβ healthβ
ββββββββββββββββββββββββββ΄βββββ΄βββββββ΄βββββββββββββββββββββββββββββββββ΄βββββββββββββββββ΄βββββββββββββββββ΄βββββββββββββ΄βββββββββββββββ΄βββββββββββββ΄βββββββββ΄βββββββββββββββββ
jbod-mgmt-04.sdfarm.kr 1096 619 /jbod/box_08_disk_030 default.30 kisti::gsdc::g02 booted rw nodrain offline no mdstat
Unless we change the broker url setting on all FSTs pointing to one of secondary MGMs and make the MQ running on that MGM to be master, I think that the read request through the seconday will never be working. It seems to me that the automatic or dynamic way to change those setting (switching MQ master and slave) are not feasible⦠maybe I was wrong, hopefully.
Are there any recommendation setups for high availability and/or load-balancing? Did I miss anything (e.g. MQ setup)?
Thank you.
Best regards,
Sang-Un