VOMS vid map only works for GID, not UID

I have set up the vomsdir *.lsc files , and VID mapping like this:

$ eos vid set map -voms /dteam: vuid:8000 vgid:8000
$ eos vid set map -voms /ops: vuid:9000 vgid:9000
$ eos vid set map -voms /atlas: vuid:10000 vgid:10000
$ eos vid set map -voms /atlas:production vuid:10100 vgid:10000
$ eos vid set map -voms /atlas/ca: vuid:10200 vgid:10000

$ eos vid ls
publicaccesslevel: => 1024
sss:"<pwd>":gid => root
sss:"<pwd>":uid => root
sudoer                 => uids(daemon)
tokensudo              => always
voms:"/atlas/ca:":gid => 10000
voms:"/atlas/ca:":uid => 10200
voms:"/atlas:":gid => 10000
voms:"/atlas:":uid => 10000
voms:"/atlas:production":gid => 10000
voms:"/atlas:production":uid => 10100
voms:"/dteam:":gid => 8000
voms:"/dteam:":uid => 8000
voms:"/ops:":gid => 9000
voms:"/ops:":uid => 9000

The VOMS FQANs in my proxy are:

$ voms-proxy-info --fqan
/atlas/ca/Role=NULL/Capability=NULL
/atlas/Role=NULL/Capability=NULL
/atlas/lcg1/Role=NULL/Capability=NULL

However no matter what I try, the GID is mapped as expected but the UID always remains 99 (I guess that means nobody?).

$ eos whoami
Virtual Identity: uid=99 (99) gid=10000 (10000) [authz:gsi] host=<node name> domain=<node domain>

Why would the GID mapping work but not the UID mapping?

I tried to clarify the format of the docs in DOC: Add missing slashes in VOMS FQANs (!219) · Merge requests · dss / eos · GitLab , not sure if the leading slashes, trailing colons, or lack thereof makes a difference but I tried every combination.

I found an error message in the MGM logs: “voms-mapping: cannot translate uid=10200 to user name with the password db”

What is the EOS password DB? I really don’t need user or group names, just numeric IDs is fine.
I tried

[root@eos-mgm-0 mgm]# groupadd -g 10000 eos10000
[root@eos-mgm-0 mgm]# useradd -M -s /sbin/nologin -u 10200 -g 10000 eos10200

But this avoided the error message in the logs without changing the result of eos whoami.
Full log output:

240905 22:28:36 011 cryptossl_X509::CertType: certificate has 3 extensions
240905 22:28:36 011 cryptossl_X509::CertType: certificate has 10 extensions
240905 22:28:36 011 secgsi_Authenticate: VOMS: Entity.vorg:         atlas atlas atlas
240905 22:28:36 011 secgsi_Authenticate: VOMS: Entity.grps:         /atlas/ca /atlas /atlas/lcg1
240905 22:28:36 011 secgsi_Authenticate: VOMS: Entity.role:         NULL NULL NULL
240905 22:28:36 011 secgsi_Authenticate: VOMS: Entity.endorsements: /atlas/ca/Role=NULL/Capability=NULL,/atlas/Role=NULL/Capability=NULL,/atlas/lcg1/Role=NULL/Capability=NULL
240905 22:28:36 011 XrootdXeq: localuse.5086:438@10-5-7-79.kube-prometheus-kubelet.kube-system.svc.kermes-dev.local pvt IPv4 login as a0d02efb.0
240905 22:28:36 time=1725575316.462041 func=IdMap                    level=INFO  logid=static.............................. unit=mgm@eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local:1094 tid=00007f7639bfb700 source=Mapping:1001                   tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=gsi sec.name="a0d02efb.0" sec.host="10-5-7-79.kube-prometheus-kubelet.kube-system.svc.kermes-dev.local" sec.vorg="atlas atlas atlas" sec.grps="/atlas/ca /atlas /atlas/lcg1" sec.role="NULL NULL NULL" sec.info="/C=CA/O=Grid/OU=westgrid.ca/CN=Ryan Taylor btj-681" sec.app="" sec.tident="localuse.5086:438@10-5-7-79.kube-prometheus-kubelet.kube-system.svc.kermes-dev.local" vid.uid=99 vid.gid=10000 sudo=0 gateway=0
240905 22:28:36 time=1725575316.462202 func=open                     level=INFO  logid=3136bc6c-6bd6-11ef-9963-f296c46510d7 unit=mgm@eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local:1094 tid=00007f7639bfb700 source=XrdMgmOfsFile:548              tident=localuse.5086:438@10-5-7-79.kube-prometheus-kubelet.kube-system.svc.kermes-dev.local sec=gsi   uid=99 gid=10000 name=a0d02efb.0 geo="" op=read path=/proc/user/ info=mgm.cmd=whoami
240905 22:28:36 time=1725575316.462445 func=open                     level=INFO  logid=static.............................. unit=mgm@eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local:1094 tid=00007f7639bfb700 source=XrdMgmOfsFile:796              tident= sec=(null) uid=99 gid=99 name=- geo="" proccmd=whoami
240905 22:28:36 011 XrootdXeq: localuse.5086:438@10-5-7-79.kube-prometheus-kubelet.kube-system.svc.kermes-dev.local disc 0:00:00

Is there a way to centrally manage the user accounts in the EOS interface instead of managing and synchronizing local accounts on the EOS nodes?

Based on common/Mapping.cc · master · dss / eos · GitLab it looks like only the local password file (or NIS/LDAP) can be used. This is an unfortunate complication for a container-based deployment, where the containers should be stateless and local user accounts don’t have much significance.

I found that if I did the local account creation first, then eos vid map, the VID mappings show a user name:

# eos vid ls
publicaccesslevel: => 1024
sss:"<pwd>":gid => root
sss:"<pwd>":uid => root
sudoer                 => uids(daemon)
tokensudo              => always
voms:"/atlas/ca:":gid => atcan
voms:"/atlas/ca:":uid => atcan
voms:"/atlas:":gid => atlas
voms:"/atlas:":uid => atlas

and eos whoami works:

bash-5.1$   eos whoami
Virtual Identity: uid=6000 (6000) gid=6000 (6000) [authz:gsi] host=node domain=domain

but the local user name is still not used in the final output, and also not relevant in the context of grid computing. So it would be nice if local user account management and synchronization could be avoided altogether, so only numeric UIDs and GIDs would be needed.

Hi Ryan,
I understand your idea, but at CERN it is a security risk if people can authenticate with non existent accounts. We can make a flag to not enforce existing user accounts. Still also from a practical point of view, do you really want to see arbitrary numbers as owners of directories and files? The only requirement is, that on MGMs there is uid=>name entry, that’s all. This is very convenient in general.
If you still want this, it is not a big deal to add it.

Certainly, as I understand EOS is used in different contexts, e.g. on a private network where local clients/accounts are trusted and unix/kerberos authn is used, or a public network where remote clients are not trusted and certificates or tokens are needed.

@apeters if you don’t mind, a flag to skip the check for a local account would be a big help! It would avoid some unpleasant hacks as there is not a very good solution in this situation for run-time configuration of local accounts in pre-built container images.

I’m already used to just dealing with UIDs/GIDs because dcache can do its own internal UID-name mapping (the storage-authzdb file) and we wanted to phase out the LDAP dependency that it had. :slight_smile: There are only ~ 3-5 identities anyway.

For an HA MGM scenario this option could avoid the need to use LDAP or do local account management/synchronization across the MGM servers.

Just to make sure of the security implications, if I understand correctly the potential risk is only related to unix authentication, right?

If so I would especially want to figure out Sec.protbind config for EOS to prevent remote clients from using unix authn, but I am not sure how to proceed on that, the only remaining possibility I can think of is an xrootd-level bug (or possibly misleading/misinterpreted documentation).

@apeters Now I am looking into scitoken configuration as well. This is required for ATLAS T2 sites now: Token-based authorization - Token-based AuthN/Z for WLCG

If I understand correctly, we will have to use name_mapfile and/or default_user in the /etc/xrootd/scitokens.cfg config file, to map tokens to user names, not just UIDs. If so , I guess we wouldn’t be able to get around local user account management in the container images , so I will have to come up with a different workaround. :confused: