We recently got a GGUS ticket fromt the CMS experiment that the HammerCloud jobs that run on our HPC system fail to access (see full log) the HC datasets that are stored on our EOS storage and fallback to a remote SE:
== CMSSW: 11-Dec-2020 13:11:52 CET Initiating request to open file root://eos.grid.vbc.ac.at:1094//eos/vbc/experiments/cms/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root
== CMSSW: %MSG-w XrdAdaptorInternal: file_open 11-Dec-2020 13:11:54 CET pre-events
== CMSSW: Failed to open file at URL root://eos.grid.vbc.ac.at:1094//eos/vbc/experiments/cms/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root.
== CMSSW: %MSG
== CMSSW: %MSG-w XrdAdaptorInternal: file_open 11-Dec-2020 13:11:54 CET pre-events
== CMSSW: Failed to open file at URL root://eos.grid.vbc.ac.at:1094//eos/vbc/experiments/cms/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root?tried=.
== CMSSW: %MSG
== CMSSW: 11-Dec-2020 13:11:54 CET Fallback request to file root://xrootd-cms.infn.it//store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root
== CMSSW: %MSG-w XrdAdaptor: file_open 11-Dec-2020 13:11:57 CET pre-events
== CMSSW: Data is served from acrc.bris.ac.uk instead of original site T2_IT_Pisa
== CMSSW: %MSG
== CMSSW: 11-Dec-2020 13:11:58 CET Successfully opened file root://xrootd-cms.infn.it//store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/DC54F73E-1676-E711-B100-FA163ED9E684.root
The HC jobs are run with a grid proxy of Andrea Sciaba with a CMS VO extension (no role).
After checking the xrdlog.mgm we can see that the access to the HC dataset results in a permission denied error for that user:
201214 03:32:42 time=1607913162.850107 func=Emsg level=ERROR logid=a4842e24-3db4-11eb-9204-3868dd28d0c0 unit=mgm@mgm-1.eos.grid.vbc.ac.at:1094 tid=00007ff7d8dfc700 source=XrdMgmOfsFile:3175 tident=grid.cms.301:583@[::ffff:172.24.77.204] sec=gsi uid=99 gid=99 name=CN=Andrea Sciaba geo="vbc" Unable to access - public access level restriction /eos/vbc/experiments/cms/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/E249D296-0B76-E711-9128-FA163EB8F562.root; Permission denied
Looking further up in the logs we can see that EOS seems to fail to parse the VOMS extension of the grid certificate and thus maps the user to nobody (uid/gid 99)
201214 03:27:33 60328 secgsi_ServerDoCert: no signed DH parameters from client:grid.cms.307:555@[::ffff:172.24.77.37] : will not delegate x509 proxy to it
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 2 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 2 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 5 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:27:33 60328 cryptossl_X509::CertType: certificate has 11 extensions
201214 03:27:33 60328 secgsi_XrdOucGMap::dn2user: no valid match found for DN '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba'
201214 03:27:33 60328 secgsi_Authenticate: WARNING: user mapping lookup failed - use DN or DN-hash as name
201214 03:27:33 60328 secgsi_ExtractVOMS: No VOMS attributes in proxy chain
201214 03:27:33 60328 secgsi_Authenticate: VOMS: Entity.vorg: <none>
201214 03:27:33 60328 secgsi_Authenticate: VOMS: Entity.grps: <none>
201214 03:27:33 60328 secgsi_Authenticate: VOMS: Entity.role: <none>
201214 03:27:33 60328 secgsi_Authenticate: VOMS: Entity.endorsements: <none>
201214 03:27:33 60328 XrootdXeq: grid.cms.307:555@[::ffff:172.24.77.37] pvt IPv4 login as /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba
Interestingly there are also times when EOS can properly parse the VOMS extension of that specific user and properly map it to a local user (we guess when it has the role “production”).
201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 3 extensions
201214 03:29:45 60177 cryptossl_X509::CertType: certificate has 11 extensions
201214 03:29:45 60177 secgsi_XrdOucGMap::dn2user: no valid match found for DN '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba'
201214 03:29:45 60177 secgsi_Authenticate: WARNING: user mapping lookup failed - use DN or DN-hash as name
201214 03:29:45 60177 secgsi_Authenticate: VOMS: Entity.vorg: cms
201214 03:29:45 60177 secgsi_Authenticate: VOMS: Entity.grps: /cms/GGUSExpert
201214 03:29:45 60177 secgsi_Authenticate: VOMS: Entity.role: production
201214 03:29:45 60177 secgsi_Authenticate: VOMS: Entity.endorsements: /cms/Role=production/Capability=NULL,/cms/ALARM/Role=NULL/Capability=NULL,/cms/GGUSExpert/Role=NULL/Capability=NULL,/cms/Role=NULL/Capability=NULL,/cms/TEAM/Role=NULL/Capability=NULL
201214 03:29:45 60177 XrootdXeq: etf.1322408:294@etf-01.cern.ch pub IPv4 login as /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba
201214 03:29:45 time=1607912985.146474 func=IdMap level=INFO logid=static.............................. unit=mgm@mgm-1.eos.grid.vbc.ac.at:1094 tid=00007ff81f9fc700 source=Mapping:993 tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=gsi sec.name="/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba" sec.host="etf-01.cern.ch" sec.vorg="cms" sec.grps="/cms/GGUSExpert" sec.role="production" sec.info="/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sciaba/CN=430796/CN=Andrea Sciaba" sec.app="" sec.tident="etf.1322408:294@etf-01.cern.ch" vid.uid=43349 vid.gid=43350
We have this vid mapping configured:
voms:"/belle:":gid => role.grid.belle.pool
voms:"/belle:":uid => grid.belle.pool001
voms:"/belle:lcgadmin":uid => grid.belle.prod
voms:"/belle:production":uid => grid.belle.prod
voms:"/cms/GGUSExpert:":uid => grid.cms.pool001
voms:"/cms/GGUSExpert:lcgadmin":uid => grid.cms.prod
voms:"/cms/GGUSExpert:production":uid => grid.cms.prod
voms:"/cms:":gid => role.grid.cms.pool
voms:"/cms:":uid => grid.cms.pool001
voms:"/cms:lcgadmin":uid => grid.cms.prod
voms:"/cms:production":uid => grid.cms.prod
Is this an issue of our vid mapping ?
For us it looks more like as if EOS fails to parse the VOMS extension of that specific grid certificate.
Any hints or pointers would be greatly appreciated.