Read failed through secondary MGMs

sahn · November 17, 2020, 6:13am

Dear Experts,

We are running EOS 4.8.25 and have three MGMs for high availability and/or load-balancing. Before setting an alias for these MGMs, we have tried to read and write a file from a client by pointing different MGMs. Writing a file works well through secondary MGMs (which is actually quite not assured): it looks the secondaries redirect the writing request properly to the master MGM, then it takes care of the rest. However, surprisingly, reading a file through secondaries are not working at all because the secondary MGMs recognize corresponding FS as offline.

Master MQ is running on the primary MGM and all FSTs are communicating with it. As far as I understood, it is because the secondary MGMs cannot recognize any FSs as online. Thus any read requests through the secondaries are failed because the secondary MGMs claim that a certain FS (with a certain fsid) are not responsible.

Here is the snippet of logs on one of secondary MGMs when it get a read request:

201117 06:02:40 068 XrootdXeq: root.1122:118@jbod-mgmt-03 pub IP46 login as daemon
201117 06:02:40 060 XrootdXeq: root.1131:335@jbod-mgmt-03 pub IP46 login as daemon
201117 06:02:40 time=1605592960.207347 func=IdMap                    level=INFO  logid=static.............................. unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=Mapping:993                    tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=sss sec.name="daemon" sec.host="jbod-mgmt-03.sdfarm.kr" sec.vorg="" sec.grps="daemon" sec.role="" sec.info="" sec.app="eoscp" sec.tident="root.1131:335@jbod-mgmt-03" vid.uid=2 vid.gid=2
201117 06:02:40 time=1605592960.207477 func=open                     level=INFO  logid=7ff99ffa-289a-11eb-8a17-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=XrdMgmOfsFile:484              tident=root.1131:335@jbod-mgmt-03 sec=sss   uid=2 gid=2 name=daemon geo="" op=read path=/eos/gsdc/testarea/rain16/file1g-jbod-mgmt-03.new.7 info=eos.app=eoscp&mgm.pcmd=open&eos.cli.access=pio
201117 06:02:40 time=1605592960.225689 func=open                     level=INFO  logid=7ff99ffa-289a-11eb-8a17-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=XrdMgmOfsFile:1009             tident=root.1131:335@jbod-mgmt-03 sec=sss   uid=2 gid=2 name=daemon geo="" acl=0 r=0 w=0 wo=0 egroup=0 shared=0 mutable=1
201117 06:02:40 time=1605592960.225915 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1375
201117 06:02:40 time=1605592960.225977 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1459
201117 06:02:40 time=1605592960.226013 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=787
201117 06:02:40 time=1605592960.226047 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1039
201117 06:02:40 time=1605592960.226078 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=31
201117 06:02:40 time=1605592960.226108 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1207
201117 06:02:40 time=1605592960.226144 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=871
201117 06:02:40 time=1605592960.226177 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=451
201117 06:02:40 time=1605592960.226220 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=955
201117 06:02:40 time=1605592960.226249 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1123
201117 06:02:40 time=1605592960.226277 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=367
201117 06:02:40 time=1605592960.226304 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=1291
201117 06:02:40 time=1605592960.226365 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=199
201117 06:02:40 time=1605592960.226402 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=535
201117 06:02:40 time=1605592960.226430 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=283
201117 06:02:40 time=1605592960.226458 func=accessHeadReplicaMultipleGroup level=WARN  logid=b935270c-23c8-11eb-861e-b8599f9c4f90 unit=mgm@jbod-mgmt-07.sdfarm.kr:1094 tid=00007f43ce9d5700 source=GeoTreeEngine:1857             tident=<service> sec=      uid=0 gid=0 name= geo="" msg="file system not readable" fsid=619

On the secondary MGM, the output of eos fs ls 619 shows:

sh-4.2# eos fs ls 619
┌────────────────────────┬────┬──────┬────────────────────────────────┬────────────────┬────────────────┬────────────┬──────────────┬────────────┬────────┬────────────────┐
│host                    │port│    id│                            path│      schedgroup│          geotag│        boot│  configstatus│       drain│  active│          health│
└────────────────────────┴────┴──────┴────────────────────────────────┴────────────────┴────────────────┴────────────┴──────────────┴────────────┴────────┴────────────────┘
 jbod-mgmt-04.sdfarm.kr   1096    619            /jbod/box_08_disk_030       default.30 kisti::gsdc::g02       booted             rw      nodrain  offline        no mdstat

Unless we change the broker url setting on all FSTs pointing to one of secondary MGMs and make the MQ running on that MGM to be master, I think that the read request through the seconday will never be working. It seems to me that the automatic or dynamic way to change those setting (switching MQ master and slave) are not feasible… maybe I was wrong, hopefully.

Are there any recommendation setups for high availability and/or load-balancing? Did I miss anything (e.g. MQ setup)?

Thank you.

Best regards,
Sang-Un

esindril · November 24, 2020, 10:35pm

Hi Sang-Un,

I guess there is some mis-configuration in your setup as this scenario of reading through an MGM slave should work. Let’s first check that you have all the following bits of config in place:

all MGM have the following lines in their configuation /etc/xrd.cf.mgm:

 mgmofs.nslib /usr/lib64/libEosNsQuarkdb.so
 mgmofs.qdbcluster localhost:7777 
 mgmofs.qdbpassword_file /etc/eos.keytab
 mgmofs.cfgtype quarkdb

you are using the configuration which is stored in QuarkDB (should be clear if you use the last option from the previous point)
you have the following env variable set in /etc/sysconfig/eos_env:
EOS_USE_QDB_MASTER=1

you have the following lines in you /etc/xrd.cf.mq file:

 # QDB cluster information  
 mq.qdbcluster esdss000.cern.ch:7777
 mq.qdbpassword_file /etc/eos.keytab

you have the following lines in all your FSTs that you configured

fstofs.qdbcluster localhost:7777
fstofs.qdbpassword_file /etc/eos.keytab

Let me know your answer to these question and then we take it from there.

Cheers,
Elvin

sahn · November 24, 2020, 11:44pm

Hi Elvin,

Thanks for the reply.

I have check through all configurations and they are exactly the same as you mentioned above just except one thing:

in /etc/xrd.cf.mgm,
mgmofs.qdbcluster jbod-mgmt-02.sdfarm.kr:7777 jbod-mgmt-05.sdfarm.kr:7777 jbod-mgmt-08.sdfarm.kr:7777

in /etc/xrd.cf.mq,
mq.qdbcluster jbod-mgmt-02.sdfarm.kr:7777 jbod-mgmt-05.sdfarm.kr:7777 jbod-mgmt-08.sdfarm.kr:7777

in /etc/xrd.cf.fst,
fstofs.qdbcluster jbod-mgmt-02.sdfarm.kr:7777 jbod-mgmt-05.sdfarm.kr:7777 jbod-mgmt-08.sdfarm.kr:7777

Do you mean by that mgmofs.qdbcluster should be localhost for MGMs and FSTs while mq.qdbcluster should point one of qdb clusters?

Thank you.

Best regards,
Sang-Un

sahn · November 25, 2020, 12:12am

Hi Elvin,

I have just tried with xrdcp instead of eoscp and it works will on reading files through secondary MGMs…

It succeeded to write and read with xrdcp but still eos fileinfo does not get output from the secondaries:

sh-4.2# xrdcp root://jbod-mgmt-07.sdfarm.kr:1094//eos/gsdc/testarea/rain16/file1g-$(hostname -s).xrdcp.1 /root/file1g-xrdcp.1
[1024MB/1024MB][100%][==================================================][512MB/s]
sh-4.2# eos fileinfo /eos/gsdc/testarea/rain16/file1g-$(hostname -s).xrdcp.1
error: cannot stat '/eos/gsdc/testarea/rain16/file1g-jbod-mgmt-03.xrdcp.1'
 (errc=2) (No such file or directory)

FYI, the current Master MGM is jbod-mgmt-01.sdfarm.kr while jbod-mgmt-04.sdfarm.kr and jbod-mgmt-07.sdfarm.kr are secondaries.

When I tried to read a file with eoscp through the secondaries, the error messages are following:

sh-4.2# eos cp /eos/gsdc/testarea/rain16/file1g-$(hostname -s).new.7 /root/file1g-eoscp.3
error: failed to parse opaque information from PIO request.
error: failed copying path=/root/file1g-eoscp.3
#WARNING [eos-cp] copied 0/1 files and 0 B in 0.06 seconds with 0 B/s

I think it does not depend on layouts because I can write and read any files with xrdcp through any of secondaries to any type of layout, e.g. plain, replica, raid6, qrain, etc.

One more thing I found strange is that when I set EOS_MGM_URL to one of secondaries, the namespace does not appear properly.

For example, I have created several directories with different types of layouts as follows:

sh-4.2# eos ls /eos/gsdc/testarea
archive
plain
raid6
raiddp
rain12
rain16
replica

But when I tried to access those directories having EOS_MGM_RUL set to a secondary MGM on a client, they do not exist:

sh-4.2# export EOS_MGM_URL="root://jbod-mgmt-04.sdfarm.kr:1094/"
sh-4.2# eos ls /eos/gsdc/testarea
Unable to stat /eos/gsdc/testarea; No such file or directory (errc=2) (No such file or directory)

Once I get back to the primary MGM then it appears again:

sh-4.2# export EOS_MGM_URL="root://jbod-mgmt-01.sdfarm.kr:1094/"
sh-4.2# eos ls /eos/gsdc/testarea/plain/
file1g-jbod-mgmt-03

However the file writing and reading are just working fine with xrdcp in anyway:

sh-4.2# export | grep MGM
export EOS_MGM_URL="root://jbod-mgmt-07.sdfarm.kr:1094/"
sh-4.2# xrdcp -f root://jbod-mgmt-07.sdfarm.kr:1094//eos/gsdc/testarea/plain/file1g-$(hostname -s) /root/file1g-xrdcp-plain
[1024MB/1024MB][100%][==================================================][512MB/s]
sh-4.2# xrdcp -f /root/file1g root://jbod-mgmt-07.sdfarm.kr:1094//eos/gsdc/testarea/plain/file1g-$(hostname -s)
[1024MB/1024MB][100%][==================================================][1024MB/s]
sh-4.2# eos ls /eos/gsdc/testarea/plain/file1g-$(hostname -s)
Unable to stat /eos/gsdc/testarea/plain/file1g-jbod-mgmt-03; No such file or directory (errc=2) (No such file or directory)

Do you have any idea on this? Please just let me know anything you need further for the investigation.

Thank you.

Best regards,
Sang-Un

esindril · November 27, 2020, 8:01am

Hi Sang-Un,

I have a suspicion what might be the problem here but I didn’t have time to test it. I will do it next week and come back to you.

Thanks,
Elvin

esindril · December 2, 2020, 9:14am

Hi Sang-Un,

I’ve tried to reproduce your issue but without success. I suspect at this point that there is a configuration issue with your setup. Let’s take this offline, maybe I can have a look at your setup.

Cheers,
Elvin

sahn · December 3, 2020, 1:12am

Hi Elvin,

Thank you for taking your time to debug. I replied to you. Thanks~!

Best regards,
Sang-Un

esindril · December 8, 2020, 9:12am

Hi Sang-Un,

Just for the record, the issue that you experienced with your setup is now fixed by the following commit and is available in 4.8.31: https://gitlab.cern.ch/dss/eos/-/commit/32b93012459f73409b45fceaa3c575cc6c47f421

Thank you for the report!
Cheers,
Elvin

sahn · December 8, 2020, 11:19pm

Hi Elvin,

Thank you so much for taking a look and for quick fix.

Best regards,
Sang-Un

sahn · January 19, 2021, 6:40am

Hi Elvin,

Recently I have installed the latest commit which is 4.8.35 and I believe it should work with reading from secondary MGMs. However the issue is still there. Just like described above, xrdcp works fine but eoscp did not read files when it refers to one of secondary MGMs.

Would you take some time to have look into this again?

Thank you.

Best regards,
Sang-Un

esindril · January 21, 2021, 5:22pm

Hi Sang-Un,

I just tried this out on my setup and works as expected. Are the old login credentials still valid? I could connect to your instance and check what is wrong there.

Cheers,
Elvin

sahn · January 21, 2021, 11:42pm

Hi Elvin,

Yes, it is still valid. Thank you so much for taking your time. Please just let me know if you encounter any problems with login.

Best regards,
Sang-Un

esindril · January 27, 2021, 8:40pm

Hi Sang-Un,

For the record, the functionality works as expected on your setup. The problem in this particular case was that one of the MGM nodes was in a strange state where it was not connecting to the QDB cluster so the namespace was empty.

Please let me know if you can reproduce this or if you have any other issues.

Thanks,
Elvin

sahn · January 29, 2021, 4:36am

Hi Elvin,

Thank you so much for the help.

I think there is an issue when we deploy the setup. The deployment goes well but as you noticed MGM secondaries are not properly enabled. After restarting them, they start working. I need to investigate this further.

Best regards,
Sang-Un

CERN Accelerating science

Read failed through secondary MGMs