Cp works except eos cp

Hi Sang-Un,

Can you also check that this value matches what the FSTs see when they boot? Namely, the log related to the “symkey=” value.

Otherwise, I’m a bit puzzled by what is happening here. Would it be possible to give me access to this instance? If so, I can send you an email with my ssh key. Let me know.

Thanks,
Elvin

Hi Elvin,

The values of symkey in all of FSTs are identical and consistent with the ones shown in MGM.

[root@jbod-mgmt-01 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-02 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-03 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-04 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-05 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-06 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-001*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-07 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-001*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-08 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-001*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-09 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-001*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=

By the way, it would be very helpful if you could have a look at our setup. I will discuss with system administrator how to provide the access to you because it is not open to public.

Best regards,
Sang-Un

Hi Elvin,

I will send you an email regarding the access to our EOS cluster.

Best regards,
Sang-Un

Hi Sang-Un,

The problem in this case was that the redirection information was longer than the default 2kB of data that XrootD supports and this needs some extra care when handling it.

Now, everything works fine for a normal xrdcp but eoscp is trying to use a more efficient way of reading the data for RAIN file, namely by doing the so called “parallel IO” open. The open command for this mode with done using a XRootD query and this gets essentially the same response as a normal open but then the eoscp is responsible for contacting directly the stripes rather than using the gateway mode.

The problem was that the response to the query command was longer than 2kB and this was not properly handled in the code. I’ve now fixed it in the following commit and will be available in 4.8.10:
https://gitlab.cern.ch/dss/eos/-/commit/bef3fefd77774d51a6b28c9bacd32c71d932126b

Therefore, until this gets released please use the normal xrdcp command to transfer reliably such files.

Thanks a lot for all the help in debugging and tracking this down.

Cheers,
Elvin

Hi Elvin,

Thank you so much for the great help and the fix. I am looking forward to have the new release as soon as possible.

Best regards,
Sang-Un

Hi Elvin,

This is just an update. I have installed 4.8.12 using commit repository and configured qrain with 16 stripes. A simple test shows that read/write using eos cp are working just fine.

sh-4.2# eos version
EOS_INSTANCE=gsdc
EOS_SERVER_VERSION=4.8.12 EOS_SERVER_RELEASE=20200907174735gitcf98311
EOS_CLIENT_VERSION=4.8.12 EOS_CLIENT_RELEASE=20200907174735gitcf98311
sh-4.2# eos cp /root/file1g /eos/gsdc/testarea/rain16/file1g-$(hostname -s)
[eoscp] file1g Total 1024.00 MB |====================| 100.00 % [393.6 MB/s]
[eos-cp] copied 1/1 files and 1.07 GB in 5.93 seconds with 180.93 MB/s
sh-4.2# eos cp /eos/gsdc/testarea/rain16/file1g-$(hostname -s) /root/file1g-eoscp
[eoscp] file1g-jbod-mgmt-09 Total 1024.00 MB |====================| 100.00 % [1159.5 MB/s]
[eos-cp] copied 1/1 files and 1.07 GB in 0.95 seconds with 1.13 GB/s
sh-4.2# eos fileinfo /eos/gsdc/testarea/rain16/file1g-$(hostname -s)
File: ‘/eos/gsdc/testarea/rain16/file1g-jbod-mgmt-09’ Flags: 0640
Size: 1073741824
Modify: Wed Sep 9 04:26:06 2020 Timestamp: 1599625566.581541000
Change: Wed Sep 9 04:26:00 2020 Timestamp: 1599625560.949948762
Birth: Wed Sep 9 04:26:00 2020 Timestamp: 1599625560.949948762
CUid: 2 CGid: 2 Fxid: 00000046 Fid: 70 Pid: 25 Pxid: 00000019
XStype: adler XS: 4f a4 17 e2 ETAGs: “18790481920:4fa417e2”
Layout: qrain Stripes: 16 Blocksize: 1M LayoutId: 40640f52 Redundancy: d5::t0
#Rep: 16
┌───┬──────┬────────────────────────┬────────────────┬─────────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┐
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│
└───┴──────┴────────────────────────┴────────────────┴─────────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┘
0 546 jbod-mgmt-04.sdfarm.kr default.41 /jbod/box_07_disk_041 booted rw nodrain online kisti::gsdc::g02
1 1470 jbod-mgmt-09.sdfarm.kr default.41 /jbod/box_18_disk_041 booted rw nodrain online kisti::gsdc::g03
2 1050 jbod-mgmt-07.sdfarm.kr default.41 /jbod/box_13_disk_041 booted rw nodrain online kisti::gsdc::g03
3 798 jbod-mgmt-05.sdfarm.kr default.41 /jbod/box_10_disk_041 booted rw nodrain online kisti::gsdc::g02
4 126 jbod-mgmt-01.sdfarm.kr default.41 /jbod/box_02_disk_041 booted rw nodrain online kisti::gsdc::g01
5 294 jbod-mgmt-02.sdfarm.kr default.41 /jbod/box_04_disk_041 booted rw nodrain online kisti::gsdc::g01
6 630 jbod-mgmt-04.sdfarm.kr default.41 /jbod/box_08_disk_041 booted rw nodrain online kisti::gsdc::g02
7 1218 jbod-mgmt-08.sdfarm.kr default.41 /jbod/box_15_disk_041 booted rw nodrain online kisti::gsdc::g03
8 462 jbod-mgmt-03.sdfarm.kr default.41 /jbod/box_06_disk_041 booted rw nodrain online kisti::gsdc::g01
9 714 jbod-mgmt-05.sdfarm.kr default.41 /jbod/box_09_disk_041 booted rw nodrain online kisti::gsdc::g02
10 1302 jbod-mgmt-08.sdfarm.kr default.41 /jbod/box_16_disk_041 booted rw nodrain online kisti::gsdc::g03
11 378 jbod-mgmt-03.sdfarm.kr default.41 /jbod/box_05_disk_041 booted rw nodrain online kisti::gsdc::g01
12 210 jbod-mgmt-02.sdfarm.kr default.41 /jbod/box_03_disk_041 booted rw nodrain online kisti::gsdc::g01
13 882 jbod-mgmt-06.sdfarm.kr default.41 /jbod/box_11_disk_041 booted rw nodrain online kisti::gsdc::g02
14 1386 jbod-mgmt-09.sdfarm.kr default.41 /jbod/box_17_disk_041 booted rw nodrain online kisti::gsdc::g03
15 42 jbod-mgmt-01.sdfarm.kr default.41 /jbod/box_01_disk_041 booted rw nodrain online kisti::gsdc::g01


Thank you.

Best regards,
Sang-Un

Glad to head that! :+1:

Hi Sang-Un,
I have installed 5.2.21 on Rocky 9.3, but when I tried to start eos5-mgm@mgm, found error failed to load key from Configstore.

240411 15:17:35 time=1712819855.050464 func=get                      level=ERROR logid=8bca11dc-f7d3-11ee-a3f5-a6bb22a596af unit=mgm@node1.cern.ch:1094 tid=00007fcce7804640 source=ConfigStore:76                 tident=<service> sec=      uid=0 gid=0 name= geo="" msg="failed to load key from Configstore" key="converter-max-threads" err="msg=Failed Numeric conversion" key= error_msg=Invalid argument
240411 15:17:35 time=1712819855.051219 func=get                      level=ERROR logid=8bca11dc-f7d3-11ee-a3f5-a6bb22a596af unit=mgm@node1.cern.ch:1094 tid=00007fcce7804640 source=ConfigStore:76                 tident=<service> sec=      uid=0 gid=0 name= geo="" msg="failed to load key from Configstore" key="converter-max-queuesize" err="msg=Failed Numeric conversion" key= error_msg=Invalid argument

and I can’t find any “symkey=” value in grep -R symkey /var/eos/md/so.mgm.dump.node1.cern.ch\:1094. There must be something wrong with my configuration, but I can’t figure it out. Could you give me any hint about this? any help would be appreciated.