Https/webdav on eos5 problem

Hi,

I’m running EOS 5.0.27 on testbed cluster. Installation includes:
2 MGMs;
3 QuarkDB;
14 FST.
Package versions on MGM:
{{{
dvl-eos-m01:~ # rpm -qa *eos* *xroot* | sort
eos-client-5.0.27-1.el7.cern.x86_64
eos-folly-2019.11.11.00-1.el7.cern.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
eos-fusex-5.0.27-1.el7.cern.x86_64
eos-fusex-core-5.0.27-1.el7.cern.x86_64
eos-fusex-selinux-5.0.27-1.el7.cern.x86_64
eos-grpc-1.41.0-1.el7.x86_64
eos-grpc-devel-1.41.0-1.el7.x86_64
eos-libmicrohttpd-0.9.38-eos.el7.cern.x86_64
eos-librichacl-1.12-14.el7.cern.x86_64
eos-ns-inspect-5.0.27-1.el7.cern.x86_64
eos-protobuf3-3.17.3-1.el7.cern.eos.x86_64
eos-quarkdb-5.0.27-1.el7.cern.x86_64
eos-richacl-1.12-14.el7.cern.x86_64
eos-rocksdb-6.2.4-1.el7.cern.x86_64
eos-server-5.0.27-1.el7.cern.x86_64
eos-xrootd-5.4.6-1.el7.cern.x86_64
xrootd-client-libs-5.4.3-1.el7.x86_64
xrootd-libs-5.4.3-1.el7.x86_64
xrootd-scitokens-5.4.3-1.el7.x86_64
xrootd-server-5.4.3-1.el7.x86_64
xrootd-server-libs-5.4.3-1.el7.x86_64
xrootd-voms-5.4.3-1.el7.x86_64
}}}
On FST:
{{{
dvl-eos-f01:~ ​​# rpm -qa *eos* *xroot* | sort
eos-client-5.0.27-1.el7.cern.x86_64
eos-folly-2019.11.11.00-1.el7.cern.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
eos-grpc-1.41.0-1.el7.x86_64
eos-grpc-devel-1.41.0-1.el7.x86_64
eos-libmicrohttpd-0.9.38-eos.el7.cern.x86_64
eos-protobuf3-3.17.3-1.el7.cern.eos.x86_64
eos-server-5.0.27-1.el7.cern.x86_64
eos-xrootd-5.4.6-1.el7.cern.x86_64
xrootd-client-libs-5.4.3-1.el7.x86_64
xrootd-libs-5.4.3-1.el7.x86_64
xrootd-scitokens-5.4.3-1.el7.x86_64
xrootd-server-5.4.3-1.el7.x86_64
xrootd-server-libs-5.4.3-1.el7.x86_64
xrootd-voms-5.4.3-1.el7.x86_64
}}}
Configuration for HTTP on MGM:
{{{
dvl-eos-m01:~ # grep -i http /etc/sysconfig/eos_env | grep -v ^#
EOS_HTTP_THREADPOOL=“epoll”
EOS_HTTP_THREADPOOL_SIZE=16
EOS_HTTP_CONNECTION_MEMORY_LIMIT=4194304
dvl-eos-m01:~ # grep -i http /etc/xrd.cf.mgm | grep -v ^#
xrd.protocol XrdHttp:8443 /usr/lib64/libXrdHttp.so
http.cadir /etc/grid-security/certificates/
http.cert /etc/grid-security/daemon/hostcert.pem
http.key /etc/grid-security/daemon/hostkey.pem
http.gridmap /etc/grid-security/grid-mapfile
http.secxtractor libXrdVoms.so
http.exthandler xrdtpc /usr/lib64/libXrdHttpTPC.so
http.exthandler EosMgmHttp /usr/lib64/libEosMgmHttp.so eos::mgm::http::redirect-to-https=0
}}}
Configuration for HTTP on FSTes:
{{{
dvl-eos-f01:~ # grep -i http /etc/sysconfig/eos_env | grep -v ^#
EOS_HTTP_THREADPOOL=“epoll”
EOS_HTTP_THREADPOOL_SIZE=16
EOS_HTTP_CONNECTION_MEMORY_LIMIT=4194304
dvl-eos-f01:~ # grep -i http /etc/xrd.cf.fst | grep -v ^#
xrd.protocol XrdHttp:9001 /usr/lib64/libXrdHttp.so
http.exthandler EosFstHttp /usr/lib64/libEosFstHttp.so none
http.exthandler xrdtpc /usr/lib64/libXrdHttpTPC.so
}}}

I can see the directory listing, but I can’t read the file:
{{{
lxui03:~ > gfal-ls -v -l https://dvl-eos.jinr.ru:8443//eos/tests/cms/
-rwxrwxrwx 0 0 0 948 Jul 21 17:16 test-06
lxui03:~ > gfal-copy -f -v https://dvl-eos.jinr.ru:8443//eos/tests/cms/test-06 file:///tmp/test-01
Copying 948 bytes https://dvl-eos.jinr.ru:8443//eos/tests/cms/test-06 => file:///tmp/test-01
event: [1658508965244] BOTH GFAL2:CORE:COPY LIST:ENTER
event: [1658508965244] BOTH GFAL2:CORE:COPY LIST:ITEM https://dvl-eos.jinr.ru:8443//eos/tests/cms/test-06 => file:///tmp/test-01
event: [1658508965244] BOTH GFAL2:CORE:COPY LIST:EXIT
event: [1658508965245] DEST GFAL2:CORE:COPY:LOCAL OVERWRITE Deleted file:///tmp/test-01
event: [1658508965245] BOTH GFAL2:CORE:COPY:LOCAL TRANSFER:ENTER https://dvl-eos.jinr.ru:8443//eos/tests/cms/test-06 => file:///tmp/test-01
event: [1658508965245] BOTH GFAL2:CORE:COPY:LOCAL TRANSFER:TYPE streamed
gfal-copy error: 5 (Input/output error) - Result HTTP 500 : Unexpected server error: 500 , while readding after 1 attempts
}}}

It seems that authorization and mapping work successfully on MGM, the request is redirected to FST:
{{{
220722 15:32:36 time=1658493156.767455 func=open level=INFO logid=5e38109c-09ba-11ed-a1a8-5254ff400201 unit=mgm@dvl-eos-m01.jinr.ru:1094 tid=00007f8f22bfe700 source=XrdMgmOfsFile:3129 tident=http sec=https uid=17600 gid=12030 name=cms001 geo=“RU::JINR::LITDVL” op=read path=/eos/tests/cms/test-06 info=&eos.app=http target[0]=(dvl-eos-f09.jinr.ru,51) target[1]=(dvl-eos-f01.jinr.ru,76) redirection=dvl-eos-f09.jinr.ru?&cap.sym=<…>&cap.msg=<…>&mgm.logid=5e38109c-09ba-11ed-a1a8-5254ff400201&mgm.blockchecksum=ignore&mgm.replicaindex=0&mgm.replicahead=0&mgm.etag=“1771942445056:bfb52a24”&mgm.id=000019c9&mgm.mtime=1658412982 xrd_port=1095 http_port=9001
220722 15:32:36 time=1658493156.767958 func=open level=INFO logid=5e38109c-09ba-11ed-a1a8-5254ff400201 unit=mgm@dvl-eos-m01.jinr.ru:1094 tid=00007f8f22bfe700 source=XrdMgmOfsFile:3197 tident=http sec=https uid=17600 gid=12030 name=cms001 geo=“RU::JINR::LITDVL” path=/eos/tests/cms/test-06 open:rt=2.18 io:bw=inf io:sched=0 io:type=buffered io:prio=default io:redirect=dvl-eos-f09.jinr.ru:9001
}}}
But I don’t understand what’s going on with the request on FST:
{{{
220722 15:32:36 1396 XrootdBridge: unknown.3:31@lxui03 login as nobody
220722 15:32:36 1396 FstOfs_stat: unknown.3:31@lxui03 Unable to stat file /eos/tests/cms/test-06; no such file or directory
220722 15:32:36 time=1658493156.770605 func=open level=INFO logid=unknown unit=fst@dvl-eos-f09.jinr.ru:1095 tid=00007f8438ffb700 source=XrdFstOfsFile:135 tident=unknown.3:31@lxui03 sec=http uid=0 gid=0 name= geo=“” path=/eos/tests/cms/test-06 info=encURI=%26cap.sym%3D4fKU58mHMjHeuh5CkzODkO6gd1M%3D%26cap.msg%3DExmVgjHiQ28QpfWUsMx8MMfuQkH%2B6MLaZpljeTXhEjjmr0OOizUJyCMgxQrBVRaJMoiiaHSMhbt52WUS%2FyC4pOdo%2FiHULK%2FA%2B1fYxdXoXgOO%2B4EN0HwvHa%2BSEcFfVKKjFXSIQfCZVos4ROujg0c9rpQn1O388QVnqJkOmsQ4K1Pp%2BFeRJP3DzfTvLpE%2BbtA9CoAhMyeDr%2FLg0cDLFybdojMllVhWzj8tzSpHfVj53jibo75grkO3ZLtIz8Q0pkdjuHvop3ffSqmsbZfdVcJ%2FFoHlQSs7mv9IZG3o0WTmQdT5e0%2BEvYSBoI7LiZzcSt3wyeIMRxya2wqvLQsKR%2F0JeWp4wThk9XvJ%2Fex0o1j7C3VYbDMl2f%2FvrojsCfnfmjNXxIqcJJq5orl9CMRhA6rz93WFht1WEDw9N4S2N%2BkQvNQG8hs4zyUIi%2F62VKh0YnXnHcct3fhhH9M2%2F0Y17mXxwDrfagGCuayUfzGU%2Fip01vwYstQBaVuUq1dmn52l1taH3kj3Ydnvi%2FnZ2PnHJTMwek2tqOdn7h1F%26mgm.logid%3D5e38109c-09ba-11ed-a1a8-5254ff400201%26mgm.blockchecksum%3Dignore%26mgm.replicaindex%3D0%26mgm.replicahead%3D0%26mgm.etag%3D%221771942445056:bfb52a24%22%26mgm.id%3D000019c9%26mgm.mtime%3D1658412982%26eos.clientinfo%3Dzbase64:MDAwMDAwNjd4nBXIwQ2AIAxA0VVcAAKehKTDiLSxhgApoHF75fb%2BLxWzlw6rttvCxYcHONNUO06MYCb7WxHCIELBOEcVLhCR9pH67H%2Bz4NEh3klhaYqM0xdn0TK8M8Z%2BVxkj9w%3D%3D open_mode=0
220722 15:32:36 1396 FstOfs_ProcessTpcOpaque: unknown.3:31@lxui03 Unable to open - capability illegal /eos/tests/cms/test-06; invalid argument
220722 15:32:36 time=1658493156.770782 func=open level=ERROR logid=unknown unit=fst@dvl-eos-f09.jinr.ru:1095 tid=00007f8438ffb700 source=XrdFstOfsFile:150 tident=unknown.3:31@lxui03 sec=http uid=0 gid=0 name= geo=“” msg=“failed while processing TPC/open opaque”
}}}

Any help would be greatly appreciated.
Thanks in advance.

Any feedback or help on this problem?

Hi Valeri,

Sorry, I missed this one. It looks like you have enabled both the libmicrohttpd and XrdHttp on the FST side and by design the FST can only publish the port for one of them. Therefore, what you need to do is set the following environment variable for the FST daemons so that it matched the port for XrdHttp - the xrootd daemon will make sure to first bind XrdHttp to that port and then libmicrohttp will fail to bind to it so it will be practically disabled. The good thing will be that the correct port will be advertised to the MGM, namely 8443.

Therefore, please set:

[Service]
Environment=EOS_FST_HTTP_PORT=8443

in the FST systemd customization file for the corresponding daemon eg.: /usr/lib/systemd/system/eos@fst1.service.d/custom.conf and then restart the FST services.
Try again and things should look better.

Cheers,
Elvin

Hi Elvin,

We have both MGM and FST running on the same virtual machine. Can I set FST to port 8444 when MGM has port 8443?

We only run one FST per four partitions on a machine,
In this case, the port is defined in the file:
/usr/lib/systemd/system/eos@fst.service.d/custom.conf
or still in:
/usr/lib/systemd/system/eos@fst1.service.d/custom.conf
?
I created both the same.

Since in my case the file was on machines other than MGM,
reading exactly this file worked. But the write fails, probably the new file is redirected to the MGM machine.

I also do not understand why the MGM and FST ports are defined in three places:

  • /etc/sysconfig/eos_env
  • /etc/xrd.cf.mgm, /etc/xrd.cf.fst
  • /usr/lib/systemd/system/eos@fst.service.d/custom.conf
    Which definition takes precedence if ports are defined in all 3 places?

Hi Valeri,

Yes, you can use any port as long as things are configured properly. My customization script was just an example, in my case I have an FST which is called fst1. You can have of course multiple FSTs running on the same machine i.e fst1, fst2, etc.

There is a historical reason for the multiple places where things are defined. Before there was any XrdHttp the way to configure HTTP access was to use the libmicrohttpd implementation. This needs an env variable to contain the port that it should bind to. Then, there was XrdHttp developed, which comes from the XRootD framework and the configuration for the port needs to be in /etc/xrd.cf.fst.

Now, if you run like me for example, a full cluster on one machine then putting the env variable in /etc/sysconfig/eos_env is not enough since all the daemon will load this environment and therefore you now need the customization per daemon. I want for example fst1 to run http on port 8001 and I need fst2 to run on 8002 - no two daemon can bind the same port. There is no other way to achieve this without the customization scripts.

The added trick as I mentioned earlier, is that in the case of FSTs you can only have one http implementation running: either libmicrohttpd or XrdHttp. You need XrdHttp running so you need to make sure that the http port in /etc/xrd.cf.fst and the EOS_FST_HTTP_PORT match so that only XrdHttp will successfully start when you start your FST.

Once you have this successfully running, then I believe also the transfers will work.

Cheers,
Elvin

Hi Elvin,

unfortunately still didn’t work.
FST where the write request was redirected, restarted the service,
with diagnostics:
{{{
220728 19:09:20 1397 FstOfs__close: nobody Unable to store file - file has been cleaned because of a client disconnect /eos/tests/cms/test-02; input/output error
220728 19:09:20 time=1659024560.399039 func=_close level=WARN logid=a36cca8e-0e8f-11ed-aba4-5254ff400202 unit=fst@dvl-eos-f08.jinr.ru:1095 tid=00007f6145df8700 source=XrdFstOfsFile:2067
tident=nobody sec= uid=17600 gid=12030 name=nobody geo=“” info=“deleting on close” fn=/eos/tests/cms/test-02 fstpath=/e/pc/00000000/000019ce reason=“client disconnect”
220728 19:09:20 time=1659024560.399059 func=_close level=INFO logid=a36cca8e-0e8f-11ed-aba4-5254ff400202 unit=fst@dvl-eos-f08.jinr.ru:1095 tid=00007f6145df8700 source=XrdFstOfsFile:2155
tident=nobody sec= uid=17600 gid=12030 name=nobody geo=“” msg=“done close” rc=-1 errc=5
220728 19:09:20 time=1659024560.399247 func=CompleteHandler level=INFO logid=static… unit=fst@dvl-eos-f08.jinr.ru:1095 tid=00007f6145df8700 source=HttpServer:459
tident= sec=(null) uid=99 gid=99 name=- geo=“” msg=“http connection disconnect” reason=“Request OK”
220728 19:09:20 time=1659024560.399748 func=Close level=ERROR logid=a36cca8e-0e8f-11ed-aba4-5254ff400202 unit=fst@dvl-eos-f08.jinr.ru:1095 tid=00007f61589fa700 source=ReplicaParLayout:473
tident=nobody sec=unix uid=0 gid=0 name=nobody geo=“” msg=“failed to close replica 1” url=“root://dvl-eos-f04.jinr.ru:1095///eos/tests/cms/test-02?&cap.sym=4fKU58mHMjHeuh5CkzODkO6gd1M=&cap.msg=ExmVg
jHiQ28HnPrlwsRhqzIECOMCRuWKIkZWT2sUiqb0Mku1mHgfz0cH/lfdGgIlDYejckRL1zQdM7oGA9CtiPt8oakn+0Hac0sl9a0pHh/7ZrGuYbwqVAzM8NQ9LrHZkHt7Y/TVm5NES5OwmtdV+VnRZDVMiXRtMeo3pvvLYILNq0tHXmezIceyANVlLYGu+GVCAhK2Oqr/xnI5ow29f
KtABOG2EQX0ec7np2I9qy+c7RUf5zOn42TdyVjOp+C2B/ZPOyBnaHFK6DKTacXHb8Lm3jWXysAYJ7O5RljliL5bVvCQnjQf6xo3C0SKGQucD468lsQLUsCmZY3e4b3FFpPFpwSLrz+hhHwuSy0O2L+wwYi5ZBCPQ6wQavGJ8XdngrDpnRXDpQ1O9nBf+xg+9r1tb4vZI+GsrrhJJ
gSBcoputeqiWygtNw9G2ROyekHKPqyf+dkc5BTO9pCMa53XRm3j22I0mjwcdh9z+USFLMcvl9fRNh+Esytcmm7N+8U3X/juLQKW8X0zo9B1tpZuj71ABYuEXpfD5OjhLO00RbuRicplRPgMcw==&mgm.logid=a36cca8e-0e8f-11ed-aba4-5254ff400202&mgm.replicain
dex=1&mgm.replicahead=0&mgm.etag=“1773284622336:00000000”&mgm.id=000019ce&eos.clientinfo=zbase64:MDAwMDAwNjh4nBXIQQqAIBBA0at0ASXBRQlzmNQZmhCVUYtuX+7e/6VidtLBGL3vCxfnH+BMUy2cGGGd7G9F8IMIBeMcVbhARDpG6rP/zYKhQ7y
TwtIUrZu+OIuW4TZr7QdvcyQ5&mgm.path=/eos/tests/cms/test-02”
220728 19:09:20 time=1659024560.399779 func=Emsg level=ERROR logid=static… unit=fst@dvl-eos-f08.jinr.ru:1095 tid=00007f61589fa700 source=Layout:89
tident= sec=(null) uid=99 gid=99 name=- geo=“” Unable to close failed ; Remote I/O error
220728 19:09:20 time=1659024560.399824 func=down level=CRIT logid=static… unit=fst@dvl-eos-f08.jinr.ru:1095 tid=00007f61589fa700 source=OpenFileTracker:83
tident= sec=(null) uid=99 gid=99 name=- geo=“” Could not find fsid=47 when calling OpenFileTracker::down for fxid=000019ce
220728 19:09:25 7934 Starting on Linux 3.10.0-1160.71.1.el7.x86_64
}}}
Maybe I need to set:
EOS_MGM_HTTP_PORT=0
EOS_FST_HTTP_PORT=0
in /etc/sysconfig/eos_env?

Hi Valeri,

Definitely do not set the env variables to 0. Can you send me your configuration files?
/etc/xrd.cf.mgm/ /etc/xrd.cf.fst /etc/sysconfig/eos_env/ and paste any systemd customizations that you are using. Also restart one FST and set me the logs to understand exactly on which port the HTTP plugin is binding to.

Then issue once more the transfer with HTTP and send me the trace of that transfer from both the MGM and the FST to which it gets redirected. It looks like your FST is crashing when getting such a request.

Thanks,
Elvin

Hi Elvin,

possibly important remarks:

  • dvl-eos.jinr.ru is an IP alias that switches to the current master;
  • dvl-eos-m01 and dvl-eos-m02 are 2 MGMs;
  • quorkdb runs on dvl-eos-m01, dvl-eos-m02 and dvl-eos-f06;
  • dvl-eos-f[00-13] are FSTs.
    In eos_env on FST only “fst”, on dvl-eos-f06 “fst quarkdb”
    {{{{
    cat /etc/xrd.cf.mgm | grep -Ev “^#|^[[:space:]]$"
    xrootd.fslib libXrdEosMgm.so
    xrootd.seclib libXrdSec.so
    xrootd.async off nosf
    xrootd.chksum adler32
    xrd.sched mint 8 maxt 256 idle 64
    all.export / nolock
    all.role manager
    oss.fdlimit 16384 32768
    sec.protocol unix
    sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
    sec.protocol krb5 host/dvl-eos-m02.jinr.ru@JINR.RU
    sec.protocol gsi -gridmap:/etc/grid-security/grid-mapfile -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/daemon/hostcert.pem -key:/etc/grid-security/daemon/hostkey.pem -vomsfun:default -vomsat:extract -vomsfunparms:dbg -crl:0 -d:2 -gmapopt:11 -gmapto:60 -moninfo:1
    sec.protbind localhost.localdomain unix sss
    sec.protbind localhost unix sss
    sec.protbind * only krb5 gsi sss unix
    mgmofs.fs /
    mgmofs.targetport 1095
    mgmofs.broker root://dvl-eos.jinr.ru:1097//eos/
    mgmofs.instance eosjinrdvl
    mgmofs.metalog /var/eos/md
    mgmofs.txdir /var/eos/tx
    mgmofs.authdir /var/eos/auth
    mgmofs.archivedir /var/eos/archive
    mgmofs.qosdir /var/eos/qos
    mgmofs.reportstorepath /var/eos/report
    mgmofs.autoloadconfig default
    mgmofs.qoscfg /var/eos/qos/qos.conf
    mgmofs.cfgtype quarkdb
    mgmofs.alias dvl-eos.jinr.ru
    mgmofs.fstgw dvl-eos-gw-fst.jinr.ru:3001
    mgmofs.nslib /usr/lib64/libEosNsQuarkdb.so
    mgmofs.qdbcluster dvl-eos-db1.jinr.ru:7777 dvl-eos-db2.jinr.ru:7777 dvl-eos-db3.jinr.ru:7777
    mgmofs.qdbpassword_file /etc/eos.keytab-qdb
    mgmofs.centraldrain true
    xrd.protocol XrdHttp:8443 /usr/lib64/libXrdHttp.so
    http.cadir /etc/grid-security/certificates/
    http.cert /etc/grid-security/daemon/hostcert.pem
    http.key /etc/grid-security/daemon/hostkey.pem
    http.gridmap /etc/grid-security/grid-mapfile
    http.secxtractor libXrdVoms.so
    http.exthandler xrdtpc /usr/lib64/libXrdHttpTPC.so
    http.exthandler EosMgmHttp /usr/lib64/libEosMgmHttp.so eos::mgm::http::redirect-to-https=0
    mgmofs.macaroonslib /usr/lib64/libXrdMacaroons.so /usr/lib64/libXrdAccSciTokens.so
    macaroons.secretkey /etc/eos.macaroon.secret
    all.sitename eosjinrdvl
    }}}
    cat /etc/xrd.cf.fst | grep -Ev "^#|^[[:space:]]
    $”
    ###########################################################
    set MGM=$EOS_MGM_ALIAS
    xrootd.fslib -2 libXrdEosFst.so
    xrootd.async off nosf
    xrd.network keepalive
    xrootd.redirect $(MGM):1094 chksum
    xrootd.seclib libXrdSec.so
    sec.protocol unix
    sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
    sec.protbind * only unix sss
    all.export / nolock
    all.trace none
    all.manager localhost 2131
    xrd.port 1095
    ofs.persist off
    ofs.osslib libEosFstOss.so
    ofs.tpc pgm /opt/eos/xrootd/bin/xrdcp
    fstofs.broker root://localhost:1097//eos/
    fstofs.autoboot true
    fstofs.quotainterval 10
    fstofs.metalog /var/eos/md/
    fstofs.qdbcluster dvl-eos-db1.jinr.ru:7777 dvl-eos-db2.jinr.ru:7777 dvl-eos-db3.jinr.ru:7777
    fstofs.qdbpassword_file /etc/eos.keytab-qdb
    xrd.protocol XrdHttp:8444 /usr/lib64/libXrdHttp.so
    http.exthandler EosFstHttp /usr/lib64/libEosFstHttp.so none
    http.exthandler xrdtpc /usr/lib64/libXrdHttpTPC.so
    }}}
    cat /etc/sysconfig/eos_env | grep -Ev “^#|^[[:space:]]*$”
    DAEMON_COREFILE_LIMIT=unlimited
    LD_PRELOAD=/usr/lib64/libjemalloc.so.1
    KRB5RCACHETYPE=none
    XRD_ROLES=“mq mgm sync fst quarkdb”
    EOS_MGM_HOST=dvl-eos-m02.jinr.ru
    EOS_MGM_HOST_TARGET=dvl-eos-m01.jinr.ru
    EOS_INSTANCE_NAME=eosjinrdvl
    EOS_AUTOLOAD_CONFIG=default
    EOS_BROKER_URL=root://dvl-eos.jinr.ru:1097//eos/
    EOS_GEOTAG=“RU::JINR::LITDVL”
    EOS_MGM_MASTER1=dvl-eos-m02.jinr.ru
    EOS_MGM_MASTER2=dvl-eos-m01.jinr.ru
    EOS_MGM_ALIAS=dvl-eos.jinr.ru
    EOS_HA_REDIRECT_READS=1
    EOS_MAIL_CC="vvm@jinr.ru"
    EOS_NOTIFY=“mail -s date +%s-hostname-eos-notify $EOS_MAIL_CC”
    EOS_ENABLE_QOS=“”
    EOS_CONVERTER_DRIVER=1
    EOS_SECONDARY_GROUPS=1
    EOS_NS_ACCOUNTING=1
    EOS_FST_NO_SSS_ENFORCEMENT=1
    EOS_FST_ASYNC_CLOSE=1
    EOS_FST_CACHE_LEVELDB=1
    EOS_FST_REPLICA_ASYNC_WRITE=1
    EOS_HTTP_THREADPOOL=“epoll”
    EOS_HTTP_THREADPOOL_SIZE=16
    EOS_HTTP_CONNECTION_MEMORY_LIMIT=4194304
    EOS_FED_MANAGER=eos.cern.ch:1094
    EOS_PSS_PORT=1098
    EOS_PSS_MGM=$EOS_MGM_ALIAS:1094
    EOS_PSS_PATH=/
    EOS_TTY_BROADCAST_LISTEN_LOGFILE=“/var/log/eos/mgm/xrdlog.mgm”
    EOS_TTY_BROACAST_EGREP=“"CRIT|ALERT|EMERG|PROGRESS"”
    EOS_MGM_STATVFS_DEFAULT_SPACE=“default”
    }}}
    I will collect the logs and post them to AFS.

Ах, и
{{{
cat /usr/lib/systemd/system/eos@fst.service.d/custom.conf
[Service]
Environment=EOS_FST_HTTP_PORT=8444
}}}

Hi Elvin,

I put the required logs in:
/afs/cern.ch/user/v/vmitsyn/public/dvl-eos.jinr.ru/

Hi Valeri,

Sorry for the late reply, I was on holidays for the past few days. I will check out the logs you provided and get back to you today.

Thanks,
Elvin

Hi Valeri,

I see from the logs that you have the EOS_FST_ASYNC_CLOSE functionality enabled. Unfortunately, this only works correctly in the latest EOS version 5.0.29 which comes which a new XRootD version that fixes a bug related to this. Could you please update and retry your transfer? By the way, does a simple xrdcp work correctly against your instance?

I also saw a warning in the FST logs when the service is starting, namely: Config warning: HTTPS functionality was not configured.
I don’t think this is critical but more an issue of using old configuration options. Nevertheless, please have a look at this post where you can find a sample configuration and maybe you can replace the http.ca/key directives with the new ones supported in XRootD 5:
https://eos-community.web.cern.ch/t/scitokens-authorization-done-but-no-username-found/783/8?u=esindril

A more distilled config for an FST you can find below:

xrd.tls  /etc/grid-security/daemon/hostcert.pem /etc/grid-security/daemon/hostkey.pem
xrd.tlsca  certdir /etc/grid-security/certificates/
xrd.protocol XrdHttp:9001 libXrdHttp.so
http.exthandler EosFstHttp /usr/lib64/libEosFstHttp.so none
http.exthandler xrdtpc libXrdHttpTPC.so
http.trace all

Thanks,
Elvin

Hi Elvin,

from your recommendation (the line in /etc/xrd.cf.fst):
{{{
xrd.tls /etc/grid-security/daemon/hostcert.pem /etc/grid-security/daemon/hostkey.pem
}}}
I realized that all FSTs must have host certificate.
But in my case, only 2 MGMs has certificates.
I understand that the certificate on FST is required only for http/webdav?
xrdcp (without TPC) works without this.
{{{
dvl-ui01:~ > date ; gfal-copy -f file:///etc/group
root://dvl-eos.jinr.ru//eos/tests/cms/test-03
Wed Aug 3 13:58:03 MSK 2022
Copying file:///etc/group [DONE] after 0s
}}}
EOS has already been updated to 5.0.29 and the corresponding xroot version:
{{{
dvl-eos-m01:~ # rpm -qa *eos* *xroot* | sort
eos-client-5.0.29-1.el7.cern.x86_64
eos-folly-2019.11.11.00-1.el7.cern.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
eos-fusex-5.0.29-1.el7.cern.x86_64
eos-fusex-core-5.0.29-1.el7.cern.x86_64
eos-fusex-selinux-5.0.29-1.el7.cern.x86_64
eos-grpc-1.41.0-1.el7.x86_64
eos-grpc-devel-1.41.0-1.el7.x86_64
eos-libmicrohttpd-0.9.38-eos.el7.cern.x86_64
eos-librichacl-1.12-14.el7.cern.x86_64
eos-ns-inspect-5.0.29-1.el7.cern.x86_64
eos-protobuf3-3.17.3-1.el7.cern.eos.x86_64
eos-quarkdb-5.0.29-1.el7.cern.x86_64
eos-richacl-1.12-14.el7.cern.x86_64
eos-rocksdb-6.2.4-1.el7.cern.x86_64
eos-server-5.0.29-1.el7.cern.x86_64
eos-xrootd-5.4.7-1.el7.cern.x86_64
xrootd-client-libs-5.4.3-1.el7.x86_64
xrootd-libs-5.4.3-1.el7.x86_64
xrootd-scitokens-5.4.3-1.el7.x86_64
xrootd-server-5.4.3-1.el7.x86_64
xrootd-server-libs-5.4.3-1.el7.x86_64
xrootd-voms-5.4.3-1.el7.x86_64
}}}
I will request certificates for FSTs but I’m afraid it’s not a fast process.

Hi Valeri,

Yes, for HTTPS one needs certificates also on the FSTs. Also certificates are a requirement for any token based access, otherwise the tokens are sent in clear text over the wire. TLS support is available also for the XRootD protocol and again a requirement if you plan to use tokens.

Let me know how it goes once you install the certificates also on the FSTs.

Thanks,
Elvin

Hi Elvin,

I turned off the FST nodes without certificates.
We also have 5 FST on each of 2 nodes with MGM.
Now gfal-copy from local file to davs works.
I test working with EOS_FST_ASYNC_CLOSE=1,
I can’t say for sure, but it seems in this case the error occurs again.

We also need a working davs with TPC.
Hope this will work too. I will test soon.

Hi Elvin,

I checked again and I am sure now that EOS_FST_ASYNC_CLOSE=1 leads to an error, at least for replica layout.

Hi Valeri,

Thank you for the confirmation. This makes perfect sense, since the HTTP protocol is less versatile than the XRootD one and indeed the async functionality can not work properly over HTTP. I will fix this for the next release, by disabling async close by default for HTTP.

Thanks,
Elvin

Hi Elvin,

I checked with eos 5.0.30, unfortunately doesn’t work with EOS_FST_ASYNC_CLOSE=1.
Apparently the commit was lost during the merge.

Hi Valeri,

Thanks for the notification. Strange, the commit is there. I will double check and get back to you. I must have missed something …

Cheers,
Elvin

Hi Valeri,

Indeed, there was a problem. The HTTP layer was not properly populating a field on which I was relying to detect that this was an http access. This is fixed now. I will tag 5.0.31. Thanks for the notification!

Cheers,
Elvin