esindril
(Elvin Alin Sindrilaru)
July 13, 2020, 8:32am
21
Hi Sang-Un,
Can you also check that this value matches what the FSTs see when they boot? Namely, the log related to the “symkey=” value.
Otherwise, I’m a bit puzzled by what is happening here. Would it be possible to give me access to this instance? If so, I can send you an email with my ssh key. Let me know.
Thanks,
Elvin
sahn
(Sang Un Ahn)
July 13, 2020, 11:49pm
22
Hi Elvin,
The values of symkey in all of FSTs are identical and consistent with the ones shown in MGM.
[root@jbod-mgmt-01 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-02 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-03 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-04 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-05 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-000*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-06 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-001*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-07 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-001*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-08 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-001*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
[root@jbod-mgmt-09 ~]# grep -R symkey= /var/lib/docker/volumes/eos-fst-001*_log/ | awk -F 'symkey=' '{print $2}' | sort -u
F5igzjdI+pFyJ49/45e9kA14sCc=
By the way, it would be very helpful if you could have a look at our setup. I will discuss with system administrator how to provide the access to you because it is not open to public.
Best regards,
Sang-Un
sahn
(Sang Un Ahn)
July 15, 2020, 4:31am
23
Hi Elvin,
I will send you an email regarding the access to our EOS cluster.
Best regards,
Sang-Un
esindril
(Elvin Alin Sindrilaru)
July 22, 2020, 2:37pm
24
Hi Sang-Un,
The problem in this case was that the redirection information was longer than the default 2kB of data that XrootD supports and this needs some extra care when handling it.
Now, everything works fine for a normal xrdcp but eoscp is trying to use a more efficient way of reading the data for RAIN file, namely by doing the so called “parallel IO” open. The open command for this mode with done using a XRootD query and this gets essentially the same response as a normal open but then the eoscp is responsible for contacting directly the stripes rather than using the gateway mode.
The problem was that the response to the query command was longer than 2kB and this was not properly handled in the code. I’ve now fixed it in the following commit and will be available in 4.8.10:
https://gitlab.cern.ch/dss/eos/-/commit/bef3fefd77774d51a6b28c9bacd32c71d932126b
Therefore, until this gets released please use the normal xrdcp command to transfer reliably such files.
Thanks a lot for all the help in debugging and tracking this down.
Cheers,
Elvin
sahn
(Sang Un Ahn)
July 24, 2020, 1:00am
25
Hi Elvin,
Thank you so much for the great help and the fix. I am looking forward to have the new release as soon as possible.
Best regards,
Sang-Un
sahn
(Sang Un Ahn)
September 9, 2020, 4:30am
26
Hi Elvin,
This is just an update. I have installed 4.8.12 using commit repository and configured qrain
with 16 stripes. A simple test shows that read/write using eos cp
are working just fine.
sh-4.2# eos version
EOS_INSTANCE=gsdc
EOS_SERVER_VERSION=4.8.12 EOS_SERVER_RELEASE=20200907174735gitcf98311
EOS_CLIENT_VERSION=4.8.12 EOS_CLIENT_RELEASE=20200907174735gitcf98311
sh-4.2# eos cp /root/file1g /eos/gsdc/testarea/rain16/file1g-$(hostname -s)
[eoscp] file1g Total 1024.00 MB |====================| 100.00 % [393.6 MB/s]
[eos-cp] copied 1/1 files and 1.07 GB in 5.93 seconds with 180.93 MB/s
sh-4.2# eos cp /eos/gsdc/testarea/rain16/file1g-$(hostname -s) /root/file1g-eoscp
[eoscp] file1g-jbod-mgmt-09 Total 1024.00 MB |====================| 100.00 % [1159.5 MB/s]
[eos-cp] copied 1/1 files and 1.07 GB in 0.95 seconds with 1.13 GB/s
sh-4.2# eos fileinfo /eos/gsdc/testarea/rain16/file1g-$(hostname -s)
File: ‘/eos/gsdc/testarea/rain16/file1g-jbod-mgmt-09’ Flags: 0640
Size: 1073741824
Modify: Wed Sep 9 04:26:06 2020 Timestamp: 1599625566.581541000
Change: Wed Sep 9 04:26:00 2020 Timestamp: 1599625560.949948762
Birth: Wed Sep 9 04:26:00 2020 Timestamp: 1599625560.949948762
CUid: 2 CGid: 2 Fxid: 00000046 Fid: 70 Pid: 25 Pxid: 00000019
XStype: adler XS: 4f a4 17 e2 ETAGs: “18790481920:4fa417e2”
Layout: qrain Stripes: 16 Blocksize: 1M LayoutId: 40640f52 Redundancy: d5::t0
#Rep: 16
┌───┬──────┬────────────────────────┬────────────────┬─────────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┐
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│
└───┴──────┴────────────────────────┴────────────────┴─────────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┘
0 546 jbod-mgmt-04.sdfarm.kr default.41 /jbod/box_07_disk_041 booted rw nodrain online kisti::gsdc::g02
1 1470 jbod-mgmt-09.sdfarm.kr default.41 /jbod/box_18_disk_041 booted rw nodrain online kisti::gsdc::g03
2 1050 jbod-mgmt-07.sdfarm.kr default.41 /jbod/box_13_disk_041 booted rw nodrain online kisti::gsdc::g03
3 798 jbod-mgmt-05.sdfarm.kr default.41 /jbod/box_10_disk_041 booted rw nodrain online kisti::gsdc::g02
4 126 jbod-mgmt-01.sdfarm.kr default.41 /jbod/box_02_disk_041 booted rw nodrain online kisti::gsdc::g01
5 294 jbod-mgmt-02.sdfarm.kr default.41 /jbod/box_04_disk_041 booted rw nodrain online kisti::gsdc::g01
6 630 jbod-mgmt-04.sdfarm.kr default.41 /jbod/box_08_disk_041 booted rw nodrain online kisti::gsdc::g02
7 1218 jbod-mgmt-08.sdfarm.kr default.41 /jbod/box_15_disk_041 booted rw nodrain online kisti::gsdc::g03
8 462 jbod-mgmt-03.sdfarm.kr default.41 /jbod/box_06_disk_041 booted rw nodrain online kisti::gsdc::g01
9 714 jbod-mgmt-05.sdfarm.kr default.41 /jbod/box_09_disk_041 booted rw nodrain online kisti::gsdc::g02
10 1302 jbod-mgmt-08.sdfarm.kr default.41 /jbod/box_16_disk_041 booted rw nodrain online kisti::gsdc::g03
11 378 jbod-mgmt-03.sdfarm.kr default.41 /jbod/box_05_disk_041 booted rw nodrain online kisti::gsdc::g01
12 210 jbod-mgmt-02.sdfarm.kr default.41 /jbod/box_03_disk_041 booted rw nodrain online kisti::gsdc::g01
13 882 jbod-mgmt-06.sdfarm.kr default.41 /jbod/box_11_disk_041 booted rw nodrain online kisti::gsdc::g02
14 1386 jbod-mgmt-09.sdfarm.kr default.41 /jbod/box_17_disk_041 booted rw nodrain online kisti::gsdc::g03
15 42 jbod-mgmt-01.sdfarm.kr default.41 /jbod/box_01_disk_041 booted rw nodrain online kisti::gsdc::g01
Thank you.
Best regards,
Sang-Un
sarric
(sarric)
April 11, 2024, 2:07pm
28
Hi Sang-Un,
I have installed 5.2.21 on Rocky 9.3, but when I tried to start eos5-mgm@mgm, found error failed to load key from Configstore
.
240411 15:17:35 time=1712819855.050464 func=get level=ERROR logid=8bca11dc-f7d3-11ee-a3f5-a6bb22a596af unit=mgm@node1.cern.ch:1094 tid=00007fcce7804640 source=ConfigStore:76 tident=<service> sec= uid=0 gid=0 name= geo="" msg="failed to load key from Configstore" key="converter-max-threads" err="msg=Failed Numeric conversion" key= error_msg=Invalid argument
240411 15:17:35 time=1712819855.051219 func=get level=ERROR logid=8bca11dc-f7d3-11ee-a3f5-a6bb22a596af unit=mgm@node1.cern.ch:1094 tid=00007fcce7804640 source=ConfigStore:76 tident=<service> sec= uid=0 gid=0 name= geo="" msg="failed to load key from Configstore" key="converter-max-queuesize" err="msg=Failed Numeric conversion" key= error_msg=Invalid argument
and I can’t find any “symkey=” value in grep -R symkey /var/eos/md/so.mgm.dump.node1.cern.ch\:1094
. There must be something wrong with my configuration, but I can’t figure it out. Could you give me any hint about this? any help would be appreciated.