Issue decoding VOMS extension

We recently got a GGUS ticket fromt the CMS experiment that the HammerCloud jobs that run on our HPC system fail to access (see full log) the HC datasets that are stored on our EOS storage and fallback to a remote SE:

== CMSSW: 11-Dec-2020 13:11:52 CET  Initiating request to open file root://eos.grid.vbc.ac.at:1094//eos/vbc/experiments/cms/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root
== CMSSW: %MSG-w XrdAdaptorInternal:  file_open 11-Dec-2020 13:11:54 CET pre-events
== CMSSW: Failed to open file at URL root://eos.grid.vbc.ac.at:1094//eos/vbc/experiments/cms/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root.
== CMSSW: %MSG
== CMSSW: %MSG-w XrdAdaptorInternal:  file_open 11-Dec-2020 13:11:54 CET pre-events
== CMSSW: Failed to open file at URL root://eos.grid.vbc.ac.at:1094//eos/vbc/experiments/cms/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root?tried=.
== CMSSW: %MSG
== CMSSW: 11-Dec-2020 13:11:54 CET  Fallback request to file root://xrootd-cms.infn.it//store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root
== CMSSW: %MSG-w XrdAdaptor:  file_open 11-Dec-2020 13:11:57 CET pre-events
== CMSSW: Data is served from acrc.bris.ac.uk instead of original site T2_IT_Pisa
== CMSSW: %MSG
== CMSSW: 11-Dec-2020 13:11:58 CET  Successfully opened file root://xrootd-cms.infn.it//store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root

The HC jobs are run with a grid proxy of Andrea Sciaba with a CMS VO extension (no role).
After checking the xrdlog.mgm we can see that the access to the HC dataset results in a permission denied error for that user:

201214 03:32:42 time=1607913162.850107 func=Emsg level=ERROR logid=a4842e24-3db4-11eb-9204-3868dd28d0c0 unit=mgm@mgm-1.eos.grid.vbc.ac.at:1094 tid=00007ff7d8dfc700 source=XrdMgmOfsFile:3175 tident=grid.cms.301:583@[::ffff:172.24.77.204] sec=gsi uid=99 gid=99 name=CN=Andrea Sciaba geo="vbc" Unable to access - public access level restriction /eos/vbc/experiments/cms/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/E249D296-0B76-E711-9128-FA163EB8F562.root; Permission denied

Looking further up in the logs we can see that EOS seems to fail to parse the VOMS extension of the grid certificate and thus maps the user to nobody (uid/gid 99)

201214 03:27:33 60328 secgsi_ServerDoCert: no signed DH parameters from client:grid.cms.307:555@[::ffff:172.24.77.37] : will not delegate x509 proxy to it
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 2 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 2 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 5 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 11 extensions
201214 03:27:33 60328 secgsi_XrdOucGMap::dn2user: no valid match found for DN '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba'
201214 03:27:33 60328 secgsi_Authenticate: WARNING: user mapping lookup failed - use DN or DN-hash as name
201214 03:27:33 60328 secgsi_ExtractVOMS: No VOMS attributes in proxy chain
201214 03:27:33 60328 secgsi_Authenticate: VOMS: Entity.vorg:         <none>
201214 03:27:33 60328 secgsi_Authenticate: VOMS: Entity.grps:         <none>
201214 03:27:33 60328 secgsi_Authenticate: VOMS: Entity.role:         <none>
201214 03:27:33 60328 secgsi_Authenticate: VOMS: Entity.endorsements: <none>
201214 03:27:33 60328 XrootdXeq: grid.cms.307:555@[::ffff:172.24.77.37] pvt IPv4 login as /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba 

Interestingly there are also times when EOS can properly parse the VOMS extension of that specific user and properly map it to a local user (we guess when it has the role “production”).

201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 11 extensions
201214 03:29:45 60177 secgsi_XrdOucGMap::dn2user: no valid match found for DN '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba'
201214 03:29:45 60177 secgsi_Authenticate: WARNING: user mapping lookup failed - use DN or DN-hash as name
201214 03:29:45 60177 secgsi_Authenticate: VOMS: Entity.vorg:         cms
201214 03:29:45 60177 secgsi_Authenticate: VOMS: Entity.grps:         /cms/GGUSExpert
201214 03:29:45 60177 secgsi_Authenticate: VOMS: Entity.role:         production
201214 03:29:45 60177 secgsi_Authenticate: VOMS: Entity.endorsements: /cms/Role=production/Capability=NULL,/cms/ALARM/Role=NULL/Capability=NULL,/cms/GGUSExpert/Role=NULL/Capability=NULL,/cms/Role=NULL/Capability=NULL,/cms/TEAM/Role=NULL/Capability=NULL
201214 03:29:45 60177 XrootdXeq: etf.1322408:294@etf-01.cern.ch pub IPv4 login as /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba

201214 03:29:45 time=1607912985.146474 func=IdMap                    level=INFO  logid=static.............................. unit=mgm@mgm-1.eos.grid.vbc.ac.at:1094 tid=00007ff81f9fc700 source=Mapping:993                    tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=gsi sec.name="/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba" sec.host="etf-01.cern.ch" sec.vorg="cms" sec.grps="/cms/GGUSExpert" sec.role="production" sec.info="/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba" sec.app="" sec.tident="etf.1322408:294@etf-01.cern.ch" vid.uid=43349 vid.gid=43350

We have this vid mapping configured:

voms:"/belle:":gid => role.grid.belle.pool
voms:"/belle:":uid => grid.belle.pool001
voms:"/belle:lcgadmin":uid => grid.belle.prod
voms:"/belle:production":uid => grid.belle.prod
voms:"/cms/GGUSExpert:":uid => grid.cms.pool001
voms:"/cms/GGUSExpert:lcgadmin":uid => grid.cms.prod
voms:"/cms/GGUSExpert:production":uid => grid.cms.prod
voms:"/cms:":gid => role.grid.cms.pool
voms:"/cms:":uid => grid.cms.pool001
voms:"/cms:lcgadmin":uid => grid.cms.prod
voms:"/cms:production":uid => grid.cms.prod

Is this an issue of our vid mapping ?
For us it looks more like as if EOS fails to parse the VOMS extension of that specific grid certificate.
Any hints or pointers would be greatly appreciated.

Hi all,
Further adding to this, our mgm xrd config uses this GSI config:

# GSI authentication
# for params see https://xrootd.slac.stanford.edu/doc/dev49/sec_config.htm#_Toc517294098
# gsi protocol params used
# crl:1 ...  use CRL if available; if the CRL certificate is missing for a given CA, the related CRL is assumed to be empty; there are better options than this, but it will work
# d:1 ... debug level 1, everything else  will explode the logs, by hexdumping cert data, this setting might get lost when changing "eos debug <level>" during runtime
# gmapopt:11  ... use gridmap-file (if mappinig is available), otherrwise login with the cert DN as user id
# gmapto:60 ... cache gridmap-file data for 60sec, i.e. you can change the file withour service restarat
# vomsat:1 ... parse certificate for VO attributes
# moninfo:1 ... unknown
# exppxy ... Specifies the exported location of the delegated proxy certificate, <uid> is template for numeric uid
sec.protocol gsi -cert:/etc/grid-security/daemon/mgm-2.eos.grid.vbc.ac.at.crt -key:/etc/grid-security/daemon/mgm-2.eos.grid.vbc.ac.at.key -gridmap:/etc/grid-security/grid-mapfile -crl:1 -d:1 -gmapopt:11 -gmapto:60 -vomsat:1 -moninfo:1 -exppxy:/var/eos/auth/gsi#<uid>

i.e. we’re not using the plugin from the xrootd-voms package - at least not explicitly.
An observation from the past is also that this config will only extract one tuple of attributes (i.e. vorg+group+role+endorsement), the man page of the xrootd-voms seems to indicate that all groups are extracted - would this improve/fix our setup with the vid mapping? (not having to map i.e. GGUSExpert explicitly)

Best,
Erich

Hi Erich,
that is not EOS code, that is the XRootD GSI plug-in doing the VOMS decoding.

I have no idea, what the problem with the GSI plug-in is. You could make a grid-map entry for ‘sciaba’ and map him to your pool account as a fall-back, but this is not a real solution …

Hi Andres,

OK, we’ll continue tracking this in a Github issue for XRootD: https://github.com/xrootd/xrootd/issues/1369

Best,
Erich

Hi @apeters

To continue investigating, we’ve enabled debugging for the vomsfun, adding to the sec.protocol gsi lline:

-vomsfun:libXrdVoms.so -vomsfunparms:dbg

Otherwise we would not get the VOMS parsing debugging output in the logs. Does this in any way change the VOMS parsing as far as EOS is concerned?
Upon this change it looks now that our vid mapping fails. Specifically see the IdMap log line:

210114 17:46:21 time=1610642781.808426 func=IdMap                    level=INFO
logid=static.............................. unit=mgm@mgm-1.eos.grid.vbc.ac.at:1094 tid=00007f8b955f7700 source=Mapping:993 tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=gsi 
sec.name="/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba" sec.host="etf-28.cern.ch" 
sec.vorg="cms cms cms cms cms" 
sec.grps="/cms /cms/ALARM /cms/GGUSExpert /cms /cms/TEAM" 
sec.role="production NULL NULL NULL NULL" 
sec.info="/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba" sec.app="" sec.tident="etf.200920:1024@etf-28.cern.ch" 
vid.uid=99 vid.gid=99

so the multiple attribute values end up on the EOS side, but the user is not mapped correctly (vid.uid=99)?

Best,
Erich

Hi Erich,
I have found this here:

One can actually force to put only one group, but I don’t see the same for the role.

I can make a patch to EOS, to use the first role and group, which can be mapped. Is that ok?

Cheers Andreas.

Hi Andreas,

We’re continuing our debugging efforts, following that man page, we’re testing with grpopt=0 to only return the first attribute (which in our observations usually contains the production role - if it is present).
The single tuple appears to be mapped correctly. We give it some time now to observe further.

This way it seems that the tuples are consistent. Without the vomsfun paramters, we would not have gotten /cms:production but actually the /cms/GGUSExport group, which consequently bloated our vid mapping.

All in all, we might get away with this, but I’m not sure if this is correct behavior, strictly speaking. I would expect that the role could also “hide” in one of the tuples further in the back.

Best,
Erich

Andreas,

another observation:
the eos vid help states:

       vid set map -krb5|-gsi|-https|-sss|-unix|-tident|-voms|-grpc|-oauth2 <pattern> [vuid:<uid>] [vgid:<gid>] 
           -voms <pattern>  : <pattern> is <group>:<role> e.g. to map VOMS attribute /dteam/cern/Role=NULL/Capability=NULL one should define <pattern>=/dteam/cern: 
           -sss key:<key>  : <key> has to be defined on client side via 'export XrdSecsssENDORSEMENT=<key>'
           -grpc key:<key> : <key> has to be added to the relevant GRPC request in the field 'authkey'

           -oauth2 key:<oauth-resource> : <oauth-resource> describes the OAUTH resource endpoint to translate OAUTH tokens to user identities

This caused us to have our vid mappings set as i.e. mapping regular CMS users, without special roles:

voms:"/cms:":uid => grid.cms.pool001

Note the trailing ‘:’ on the /cms: group spec as this was the sample given in the help text.
CMS users would get mapped as uid/gid 99.

We’ve achieved the expected behavior using these vid map entries

voms:"/cms:NULL":gid => role.grid.cms.pool
voms:"/cms:NULL":uid => grid.cms.pool001

i.e. specifying the "NULL" string that we also see in the logs like this:

210115 15:45:01 time=1610721901.706113 func=IdMap                    level=INFO  logid=static.............................. unit=mgm@mgm-1.eos.grid.vbc.ac.at:1094 tid=00007fb5d3efd700 
source=Mapping:993                    tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=gsi 
sec.name="erich.birngruber" sec.host="[::ffff:172.24.76.74]" 
sec.vorg="cms" sec.grps="/cms" sec.role="NULL" 
sec.info="/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=ebirngru/CN=845559/CN=Erich Birngruber" sec.app="" 
sec.tident="erich.bi.17226:1405@[::ffff:172.24.76.74]" 
vid.uid=43000 vid.gid=43000

(already mapped correctly).

The corresponding output fromm the voms parsing is:

210115 15:45:01 227292  XrdVomsFun: proxy: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=ebirngru/CN=845559/CN=Erich Birngruber/CN=415903503
210115 15:45:01 227292  XrdVomsFun: adding cert: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=ebirngru/CN=845559/CN=Erich Birngruber
210115 15:45:01 227292  XrdVomsFun: retrieval successful
210115 15:45:01 227292  XrdVomsFun: found VO: cms
210115 15:45:01 227292  XrdVomsFun:  ---> group: '/cms', role: 'NULL', cap: 'NULL'
210115 15:45:01 227292  XrdVomsFun:  ---> fqan: '/cms/Role=NULL/Capability=NULL'
210115 15:45:01 227292 secgsi_Authenticate: VOMS: Entity.vorg:         cms
210115 15:45:01 227292 secgsi_Authenticate: VOMS: Entity.grps:         /cms
210115 15:45:01 227292 secgsi_Authenticate: VOMS: Entity.role:         NULL
210115 15:45:01 227292 secgsi_Authenticate: VOMS: Entity.endorsements: /cms/Role=NULL/Capability=NULL

Best,
Erich

Ah yes,
makes sense the way it is implemented… You know, I think you are the first and only person who ever used that :slight_smile:

This needs some consolidation also with respect to accepting a list of groups etc …

1 Like

https://its.cern.ch/jira/browse/EOS-4553

Hi Andreas,
Thanks for adding this as EOS feature.
For now, I think everything is explained and we can continue with the settings as discussed above. (also closing the Github issue in XRootD)
Best,
Erich