Https/webdav on eos5 problem

Hi Valeri,

Definitely do not set the env variables to 0. Can you send me your configuration files?
/etc/xrd.cf.mgm/ /etc/xrd.cf.fst /etc/sysconfig/eos_env/ and paste any systemd customizations that you are using. Also restart one FST and set me the logs to understand exactly on which port the HTTP plugin is binding to.

Then issue once more the transfer with HTTP and send me the trace of that transfer from both the MGM and the FST to which it gets redirected. It looks like your FST is crashing when getting such a request.

Thanks,
Elvin

Hi Elvin,

possibly important remarks:

  • dvl-eos.jinr.ru is an IP alias that switches to the current master;
  • dvl-eos-m01 and dvl-eos-m02 are 2 MGMs;
  • quorkdb runs on dvl-eos-m01, dvl-eos-m02 and dvl-eos-f06;
  • dvl-eos-f[00-13] are FSTs.
    In eos_env on FST only “fst”, on dvl-eos-f06 “fst quarkdb”
    {{{{
    cat /etc/xrd.cf.mgm | grep -Ev “^#|^[[:space:]]$"
    xrootd.fslib libXrdEosMgm.so
    xrootd.seclib libXrdSec.so
    xrootd.async off nosf
    xrootd.chksum adler32
    xrd.sched mint 8 maxt 256 idle 64
    all.export / nolock
    all.role manager
    oss.fdlimit 16384 32768
    sec.protocol unix
    sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
    sec.protocol krb5 host/dvl-eos-m02.jinr.ru@JINR.RU
    sec.protocol gsi -gridmap:/etc/grid-security/grid-mapfile -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/daemon/hostcert.pem -key:/etc/grid-security/daemon/hostkey.pem -vomsfun:default -vomsat:extract -vomsfunparms:dbg -crl:0 -d:2 -gmapopt:11 -gmapto:60 -moninfo:1
    sec.protbind localhost.localdomain unix sss
    sec.protbind localhost unix sss
    sec.protbind * only krb5 gsi sss unix
    mgmofs.fs /
    mgmofs.targetport 1095
    mgmofs.broker root://dvl-eos.jinr.ru:1097//eos/
    mgmofs.instance eosjinrdvl
    mgmofs.metalog /var/eos/md
    mgmofs.txdir /var/eos/tx
    mgmofs.authdir /var/eos/auth
    mgmofs.archivedir /var/eos/archive
    mgmofs.qosdir /var/eos/qos
    mgmofs.reportstorepath /var/eos/report
    mgmofs.autoloadconfig default
    mgmofs.qoscfg /var/eos/qos/qos.conf
    mgmofs.cfgtype quarkdb
    mgmofs.alias dvl-eos.jinr.ru
    mgmofs.fstgw dvl-eos-gw-fst.jinr.ru:3001
    mgmofs.nslib /usr/lib64/libEosNsQuarkdb.so
    mgmofs.qdbcluster dvl-eos-db1.jinr.ru:7777 dvl-eos-db2.jinr.ru:7777 dvl-eos-db3.jinr.ru:7777
    mgmofs.qdbpassword_file /etc/eos.keytab-qdb
    mgmofs.centraldrain true
    xrd.protocol XrdHttp:8443 /usr/lib64/libXrdHttp.so
    http.cadir /etc/grid-security/certificates/
    http.cert /etc/grid-security/daemon/hostcert.pem
    http.key /etc/grid-security/daemon/hostkey.pem
    http.gridmap /etc/grid-security/grid-mapfile
    http.secxtractor libXrdVoms.so
    http.exthandler xrdtpc /usr/lib64/libXrdHttpTPC.so
    http.exthandler EosMgmHttp /usr/lib64/libEosMgmHttp.so eos::mgm::http::redirect-to-https=0
    mgmofs.macaroonslib /usr/lib64/libXrdMacaroons.so /usr/lib64/libXrdAccSciTokens.so
    macaroons.secretkey /etc/eos.macaroon.secret
    all.sitename eosjinrdvl
    }}}
    cat /etc/xrd.cf.fst | grep -Ev "^#|^[[:space:]]
    $”
    ###########################################################
    set MGM=$EOS_MGM_ALIAS
    xrootd.fslib -2 libXrdEosFst.so
    xrootd.async off nosf
    xrd.network keepalive
    xrootd.redirect $(MGM):1094 chksum
    xrootd.seclib libXrdSec.so
    sec.protocol unix
    sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
    sec.protbind * only unix sss
    all.export / nolock
    all.trace none
    all.manager localhost 2131
    xrd.port 1095
    ofs.persist off
    ofs.osslib libEosFstOss.so
    ofs.tpc pgm /opt/eos/xrootd/bin/xrdcp
    fstofs.broker root://localhost:1097//eos/
    fstofs.autoboot true
    fstofs.quotainterval 10
    fstofs.metalog /var/eos/md/
    fstofs.qdbcluster dvl-eos-db1.jinr.ru:7777 dvl-eos-db2.jinr.ru:7777 dvl-eos-db3.jinr.ru:7777
    fstofs.qdbpassword_file /etc/eos.keytab-qdb
    xrd.protocol XrdHttp:8444 /usr/lib64/libXrdHttp.so
    http.exthandler EosFstHttp /usr/lib64/libEosFstHttp.so none
    http.exthandler xrdtpc /usr/lib64/libXrdHttpTPC.so
    }}}
    cat /etc/sysconfig/eos_env | grep -Ev “^#|^[[:space:]]*$”
    DAEMON_COREFILE_LIMIT=unlimited
    LD_PRELOAD=/usr/lib64/libjemalloc.so.1
    KRB5RCACHETYPE=none
    XRD_ROLES=“mq mgm sync fst quarkdb”
    EOS_MGM_HOST=dvl-eos-m02.jinr.ru
    EOS_MGM_HOST_TARGET=dvl-eos-m01.jinr.ru
    EOS_INSTANCE_NAME=eosjinrdvl
    EOS_AUTOLOAD_CONFIG=default
    EOS_BROKER_URL=root://dvl-eos.jinr.ru:1097//eos/
    EOS_GEOTAG=“RU::JINR::LITDVL”
    EOS_MGM_MASTER1=dvl-eos-m02.jinr.ru
    EOS_MGM_MASTER2=dvl-eos-m01.jinr.ru
    EOS_MGM_ALIAS=dvl-eos.jinr.ru
    EOS_HA_REDIRECT_READS=1
    EOS_MAIL_CC="vvm@jinr.ru"
    EOS_NOTIFY=“mail -s date +%s-hostname-eos-notify $EOS_MAIL_CC”
    EOS_ENABLE_QOS=“”
    EOS_CONVERTER_DRIVER=1
    EOS_SECONDARY_GROUPS=1
    EOS_NS_ACCOUNTING=1
    EOS_FST_NO_SSS_ENFORCEMENT=1
    EOS_FST_ASYNC_CLOSE=1
    EOS_FST_CACHE_LEVELDB=1
    EOS_FST_REPLICA_ASYNC_WRITE=1
    EOS_HTTP_THREADPOOL=“epoll”
    EOS_HTTP_THREADPOOL_SIZE=16
    EOS_HTTP_CONNECTION_MEMORY_LIMIT=4194304
    EOS_FED_MANAGER=eos.cern.ch:1094
    EOS_PSS_PORT=1098
    EOS_PSS_MGM=$EOS_MGM_ALIAS:1094
    EOS_PSS_PATH=/
    EOS_TTY_BROADCAST_LISTEN_LOGFILE=“/var/log/eos/mgm/xrdlog.mgm”
    EOS_TTY_BROACAST_EGREP=“"CRIT|ALERT|EMERG|PROGRESS"”
    EOS_MGM_STATVFS_DEFAULT_SPACE=“default”
    }}}
    I will collect the logs and post them to AFS.

Ах, и
{{{
cat /usr/lib/systemd/system/eos@fst.service.d/custom.conf
[Service]
Environment=EOS_FST_HTTP_PORT=8444
}}}

Hi Elvin,

I put the required logs in:
/afs/cern.ch/user/v/vmitsyn/public/dvl-eos.jinr.ru/

Hi Valeri,

Sorry for the late reply, I was on holidays for the past few days. I will check out the logs you provided and get back to you today.

Thanks,
Elvin

Hi Valeri,

I see from the logs that you have the EOS_FST_ASYNC_CLOSE functionality enabled. Unfortunately, this only works correctly in the latest EOS version 5.0.29 which comes which a new XRootD version that fixes a bug related to this. Could you please update and retry your transfer? By the way, does a simple xrdcp work correctly against your instance?

I also saw a warning in the FST logs when the service is starting, namely: Config warning: HTTPS functionality was not configured.
I don’t think this is critical but more an issue of using old configuration options. Nevertheless, please have a look at this post where you can find a sample configuration and maybe you can replace the http.ca/key directives with the new ones supported in XRootD 5:
https://eos-community.web.cern.ch/t/scitokens-authorization-done-but-no-username-found/783/8?u=esindril

A more distilled config for an FST you can find below:

xrd.tls  /etc/grid-security/daemon/hostcert.pem /etc/grid-security/daemon/hostkey.pem
xrd.tlsca  certdir /etc/grid-security/certificates/
xrd.protocol XrdHttp:9001 libXrdHttp.so
http.exthandler EosFstHttp /usr/lib64/libEosFstHttp.so none
http.exthandler xrdtpc libXrdHttpTPC.so
http.trace all

Thanks,
Elvin

Hi Elvin,

from your recommendation (the line in /etc/xrd.cf.fst):
{{{
xrd.tls /etc/grid-security/daemon/hostcert.pem /etc/grid-security/daemon/hostkey.pem
}}}
I realized that all FSTs must have host certificate.
But in my case, only 2 MGMs has certificates.
I understand that the certificate on FST is required only for http/webdav?
xrdcp (without TPC) works without this.
{{{
dvl-ui01:~ > date ; gfal-copy -f file:///etc/group
root://dvl-eos.jinr.ru//eos/tests/cms/test-03
Wed Aug 3 13:58:03 MSK 2022
Copying file:///etc/group [DONE] after 0s
}}}
EOS has already been updated to 5.0.29 and the corresponding xroot version:
{{{
dvl-eos-m01:~ # rpm -qa *eos* *xroot* | sort
eos-client-5.0.29-1.el7.cern.x86_64
eos-folly-2019.11.11.00-1.el7.cern.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
eos-fusex-5.0.29-1.el7.cern.x86_64
eos-fusex-core-5.0.29-1.el7.cern.x86_64
eos-fusex-selinux-5.0.29-1.el7.cern.x86_64
eos-grpc-1.41.0-1.el7.x86_64
eos-grpc-devel-1.41.0-1.el7.x86_64
eos-libmicrohttpd-0.9.38-eos.el7.cern.x86_64
eos-librichacl-1.12-14.el7.cern.x86_64
eos-ns-inspect-5.0.29-1.el7.cern.x86_64
eos-protobuf3-3.17.3-1.el7.cern.eos.x86_64
eos-quarkdb-5.0.29-1.el7.cern.x86_64
eos-richacl-1.12-14.el7.cern.x86_64
eos-rocksdb-6.2.4-1.el7.cern.x86_64
eos-server-5.0.29-1.el7.cern.x86_64
eos-xrootd-5.4.7-1.el7.cern.x86_64
xrootd-client-libs-5.4.3-1.el7.x86_64
xrootd-libs-5.4.3-1.el7.x86_64
xrootd-scitokens-5.4.3-1.el7.x86_64
xrootd-server-5.4.3-1.el7.x86_64
xrootd-server-libs-5.4.3-1.el7.x86_64
xrootd-voms-5.4.3-1.el7.x86_64
}}}
I will request certificates for FSTs but I’m afraid it’s not a fast process.

Hi Valeri,

Yes, for HTTPS one needs certificates also on the FSTs. Also certificates are a requirement for any token based access, otherwise the tokens are sent in clear text over the wire. TLS support is available also for the XRootD protocol and again a requirement if you plan to use tokens.

Let me know how it goes once you install the certificates also on the FSTs.

Thanks,
Elvin

Hi Elvin,

I turned off the FST nodes without certificates.
We also have 5 FST on each of 2 nodes with MGM.
Now gfal-copy from local file to davs works.
I test working with EOS_FST_ASYNC_CLOSE=1,
I can’t say for sure, but it seems in this case the error occurs again.

We also need a working davs with TPC.
Hope this will work too. I will test soon.

Hi Elvin,

I checked again and I am sure now that EOS_FST_ASYNC_CLOSE=1 leads to an error, at least for replica layout.

Hi Valeri,

Thank you for the confirmation. This makes perfect sense, since the HTTP protocol is less versatile than the XRootD one and indeed the async functionality can not work properly over HTTP. I will fix this for the next release, by disabling async close by default for HTTP.

Thanks,
Elvin

Hi Elvin,

I checked with eos 5.0.30, unfortunately doesn’t work with EOS_FST_ASYNC_CLOSE=1.
Apparently the commit was lost during the merge.

Hi Valeri,

Thanks for the notification. Strange, the commit is there. I will double check and get back to you. I must have missed something …

Cheers,
Elvin

Hi Valeri,

Indeed, there was a problem. The HTTP layer was not properly populating a field on which I was relying to detect that this was an http access. This is fixed now. I will tag 5.0.31. Thanks for the notification!

Cheers,
Elvin

Hi Elvin,

I updated to version 5.0.31, tested, now it works without error.
Thank you!

Hi Elvin,

It turned out that there is one problem in HTTPS.
Using:
{{{
gfal-copy --copy-mode push …
}}}
copying does not work even from local to local EOS.
I have now tested the versions:
{{{
dvl-eos-m02:~ # rpm -qa eos-* xrootd-* | sort
eos-client-5.1.1-1.el7.cern.x86_64
eos-folly-2019.11.11.00-1.el7.cern.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
eos-fusex-5.1.1-1.el7.cern.x86_64
eos-fusex-core-5.1.1-1.el7.cern.x86_64
eos-fusex-selinux-5.1.1-1.el7.cern.x86_64
eos-grpc-1.41.0-1.el7.x86_64
eos-grpc-devel-1.41.0-1.el7.x86_64
eos-libmicrohttpd-0.9.38-eos.el7.cern.x86_64
eos-librichacl-1.12-14.el7.cern.x86_64
eos-ns-inspect-5.1.1-1.el7.cern.x86_64
eos-protobuf3-3.17.3-1.el7.cern.eos.x86_64
eos-quarkdb-5.1.1-1.el7.cern.x86_64
eos-richacl-1.12-14.el7.cern.x86_64
eos-rocksdb-6.2.4-1.el7.cern.x86_64
eos-server-5.1.1-1.el7.cern.x86_64
eos-xrootd-5.5.1-1.el7.cern.x86_64
xrootd-client-libs-5.5.0-1.el7.x86_64
xrootd-libs-5.5.0-1.el7.x86_64
xrootd-scitokens-5.5.0-1.el7.x86_64
xrootd-server-5.5.0-1.el7.x86_64
xrootd-server-libs-5.5.0-1.el7.x86_64
xrootd-voms-5.5.0-1.el7.x86_64
}}}
We want to work without using /etc/grid-security/grid-mapfile, only with VOMS. But copying with “push” doesn’t work like that,
I get an error:
{{{
Copy failed (3rd push). Last attempt: [gfal_http_third_party_copy] Transfer failure: Remote side failed with status code 403
}}}
With “pull” everything works.
The xrootd protocol works with both “push” and “pull”.
When using /etc/grid-security/grid-mapfile, https works with “push” as well.

Looks like some settings are missing?

Hi Valeri,

Could you please show me the contents of your certificate including the VOMS group information that you are using when doing the transfer?
Also could you please paste the eos vid rules that you are using to enforce the VOMS mapping?

Thanks,
Elvin

Hi Elvin,

here is the proxy I’m using:
{{{
dvl-ui01:~ > voms-proxy-info --all
subject : /C=RU/O=RDIG/OU=users/OU=jinr.ru/CN=Valery Mitsyn/CN=2047955353
issuer : /C=RU/O=RDIG/OU=users/OU=jinr.ru/CN=Valery Mitsyn
identity : /C=RU/O=RDIG/OU=users/OU=jinr.ru/CN=Valery Mitsyn
type : RFC3820 compliant impersonation proxy
strength : 2048
path : /tmp/x509up_u8142
timeleft : 99:59:56
key usage : Digital Signature, Non Repudiation, Key Encipherment, Data Encipherment, Key Agreement
=== VO cms extension information ===
VO : cms
subject : /C=RU/O=RDIG/OU=users/OU=jinr.ru/CN=Valery Mitsyn
issuer : /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch
attribute : /cms/Role=NULL/Capability=NULL
timeleft : 99:59:56
uri : lcg-voms2.cern.ch:15002
}}}
and related definitions for vid:
{{{
dvl-eos-m02:~ # eos vid ls | grep cms
voms:“/cms:”:gid => lcms
voms:“/cms:”:uid => cms001
}}}

Hi Valeri,

I tried configuring our pre-production instance to skip the grid-map file for HTTPS requests and this works as expected for me in the sense that the vid mapping is respected.

The only modification I have done in the /etc/xrd.cf.mgm configuration is to remove/comment the following line:

#http.gridmap  /etc/grid-security/grid-mapfile

Then using the command below:

curl -L -v --capath /etc/grid-security/certificates --cert /tmp/x509up_u$(id -u) --cacert /tmp/x509up_u$(id -u) --key /tmp/x509up_u$(id -u) https://eospps.cern.ch/eos/pps/opstest/egi/file1.dat --upload-file /etc/passwd

The upload was successful for me. Can you send me the full /etc/xrd.cf.mgm and the MGM logs that you have once you issue such a request?

Thanks,
Elvin

Hi Elvin,

just uploading files works without problems.
TPC in pull mode also works without problems.
Doesn’t work without authorization via grid-mapfile only TPC in push mode,
this mode only works with my certificate in grid-mapfile.
{{{
dvl-ui01:~ > gfal-copy --copy-mode pull
davs://se-wbdv.jinr-t1.ru:2880//pnfs/jinrt1.ru/data/cms/vvm-test-01/5GB-000
davs://dvl-eos.jinr.ru:8443//eos/tests/cms/5GB-020
Copying davs://se-wbdv.jinr-t1.ru:2880//pnfs/jinr-t1.ru/data/cms/vvm-test-01/5GB-000 [DONE] after 57s
dvl-ui01:~ >
dvl-ui01:~ > gfal-copy --copy-mode push
davs://se-wbdv.jinr-t1.ru:2880//pnfs/jinr-t1.ru/data/cms/vvm-test-01/5GB-000
davs://dvl-eos.jinr.ru:8443//eos/tests/cms/5GB-021
Copying davs://se-wbdv.jinr-t1.ru:2880//pnfs/jinr-t1.ru/data/cms/vvm-test-01/5GB-000 [FAILED] after 0s
gfal-copy error: 5 (Input/output error) - TRANSFER ERROR: Copy failed (3rd push). Last attempt: Transfer failure: rejected PUT: 403 FORBIDDEN\n
}}}