Can you please confirm if there will be a new citrine release, i.e
4.8.104? The latest citrine version, 4.8.103 has a timestamp, 2023-06-16 13:42, in our mirror repo.
We would very much like to benefit from this change,
MGM/HTTP: Don't mask ENODEV errors as this leads to creation of 0-size files... (7c04411e) · Commits · dss / eos · GitLab, that is resolving the issue tracked in Cern Authentication, which has an operational impact on the ATLAS WebDAV recalls at RAL
Also, the new release is expected to contain this change kindly made by Elvin in reponse to
It looks like it is
[root@antares-eos14 ~]# ps aux | grep mgm
daemon 7199 2.5 0.1 17690328 320172 ? SLsl 10:12 4:06 /opt/eos/xrootd/bin/xrootd -n mgm -c /etc/xrd.cf.mgm -l /var/log/eos/xrdlog.mgm -Rdaemon
daemon 7300 0.0 0.0 1051072 12076 ? S 10:12 0:00 /opt/eos/xrootd/bin/xrootd -n mgm -c /etc/xrd.cf.mgm -l /var/log/eos/xrdlog.mgm -Rdaemon
root 43128 0.0 0.0 112820 2376 pts/0 S+ 12:56 0:00 grep --color=auto mgm
We would need to coordinate on this with the CTA team, but currently there are many people absent. Therefore, I will try to give you a better estimate on this next week.
Thanks for this Elvin. For my reference, the reason you would like to coordinate with the CTA team is that the above ENODEV change requires a change also in the CTA code?
No, I don’t think there are any changes required on the CTA side. I just want to make sure there is nothing else that CTA would like in this (last) EOS 4 release.
Sorry for the hassle. Do you have any update on this issue?
Yes, we will have a new EOS 4 release with the two fixes that you are interested in by the end of the week.
We just released EOS 4.8.104 that includes the two fixes you are interested in. You can get the packages from the usual location:
Thank you so much for this! The rpms will be picked up by our repo server tonight and we test and push this version in production next week.
We installed EOS 4.8.104 on our preprod instance. Everything works as expected except WebDAV reads: when I try to copy of out of EOS a file that has been staged from tape I get the following error
Copying https://antares-preprod.stfc.ac.uk:9000/eos/antarespreprodtier1/dteam/random_400MB [FAILED] after 0s
gfal-copy error: 112 (Host is down) - Result (Neon): Invalid Content-Length in response after 1 attempts
Can you please post the fileinfo about this particular file you are trying to read out?
eos fileinfo /eos/antarespreprodtier1/dteam/random_400MB random_400MB_local
Also retry to copy operation and please send the MGM logs for the corresponding interval of time.
[root@antares-eos94 ~]# eos fileinfo /eos/antarespreprodtier1/dteam/random_400MB
File: ‘/eos/antarespreprodtier1/dteam/random_400MB’ Flags: 0644
Modify: Mon Sep 4 16:46:41 2023 Timestamp: 1693842401.548262000
Change: Wed Sep 6 10:29:46 2023 Timestamp: 1693992586.373337113
Birth: Mon Sep 4 16:46:39 2023 Timestamp: 1693842399.643205390
CUid: 36300 CGid: 24311 Fxid: 2faf080c Fid: 800000012 Pid: 800000004 Pxid: 2faf0804
XStype: adler XS: eb 8b 07 de ETAGs: “214748368021225472:eb8b07de”
Layout: replica Stripes: 1 Blocksize: 4k LayoutId: 00100012 Redundancy: d1::t1
TapeID: 4294967377 StorageClass: dteam_tapetest
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│
1 27 antares-eos96.scd.rl.ac.uk retrieve.0 /eos/data-sdk booted rw nodrain online undef
The MGM log during the interval when the copy operation was attempted are here
Also, in case it is of any use, here is the MGM log for the successfful bring-online operation
Things look fine at the MGM, could you send me the logs for the same transfer (around 10:40:29) from the following FST daemon:
antares-eos96.scd.rl.ac.uk with the HTTP port 8001?
Here it is
I can see some errors but cant understand what they mean.
Looking over the FST logs, I still don’t see anything wrong there. The open arrives at the FST and the open is done and a reply is sent to the client but then it disconnects.
Also in 4.8.104 there is no code change to the FST part of the EOS setup so I don’t think this is a regression from the previous version.
Can you send me the logs from the command when running with the following options?
gfal-copy -vvv --log-file=gfal2.log
Are you actually able to copy out the file with simple
I noticed the following line in the FST log
230906 10:40:29 time=1693993229.238428 func=FileClose level=ERROR logid=static… email@example.com:1095 tid=00007fa3c05f5700 source=HttpServer:230 tident= sec=(null) uid=99 gid=99 name=- geo=“” msg=“clean-up interrupted or IO error related PUT/GET request” path=“/eos/antarespreprodtier1/dteam/random_400MB”
Please see the gfal2 log in
Yes, I can copy out the with xrdcp (and also with gfal-copy root://…) so using XRootD as a protocol. It is the WebDAV protocol that generates the above error.
Just to mention that after the above error - using gfal-copy https://… - the destination local file (random_400MB_local) is a stub, i.e. it has a zero size,
I see that the FST reply that the gfal client prints out has the
Content-Length field, displayed twice.
Were these kind of requests working before doing the EOS upgrade to 4.8.104?
If it did what version of EOS were you using before?
Do you have the same version on both the MGM and FSTs? 4.8.104?
What is the version of gfal that you are using?
I will be on holidays until Tue the 12th of September, so I will follow up on this when I am back.
Thanks for this - to answer your questions:
Yes, gfal-copy of a staged file out of EOS was working before the upgrade to 4.8.104
4.8.98 (the version we currently run in production)
All EOS nodes - MGM and FSTs - in our preprod cluster have the same version, 4.8.104
I paste the versin of the all the gfal rpms on the machine where I run gfal-copy
rpm -q --whatprovides /usr/bin/gfal-copy
-bash-4.2$ rpm -qa | grep gfal2
There was indeed a regression in the last version, this is now fixed and a new release is building as we speak. This will be eos-4.8.105.
This issue affects only the FST nodes so if you are in a hurry you can run in a mixed setup with the MGM eos-4.8.104 and the FSTs eos-4.8.98. Otherwise, you can install the upcoming 4.8.105 everywhere.
I will let you know once it’s available in the usual yum repositories. Thank you for the bug report!
The new packages are available in the usual location:
Thanks for this Elvin. Our repos will be synced tonight and we will test this version tomorrow.