The direct cephfs access via xrootd redirection discussed and implemented there works nicely and fits the use case for HEP data access. Now we are also investigating EOS deployments on k8s for the SKAO astronomy project, which is employing a kubernetes-centric approach for deploying SRC sites (aka grid sites). In this context, it is required for end users to have POSIX access to their data files. Moreover, the potential benefit of direct access is even higher, because the prohibitively large size of the astro data files prevents stage in to local disk. Instead, the users would be forced to stage in their data from EOS (cephfs) to their home directories (a different cephfs filesystem, obviously redundant and inefficient to copy data from one place in cephfs to another).
One nice solution would be to implement direct access via a simple symlink, e.g. eos cp --link /eos/data.root ~/ would be a “copy” operation that would instantly complete by creating a symlink pointing to the underlying data file. What do you think @apeters ?
Another option could be the FUSE interfaces but it seems there would be challenges getting this to work in a k8s pod.
Hi Ryan, can you explain me, which of this two cases you talk about or if you mean both:
/cephfs/user/f/foo should be registered under /eos/user/f/foo
e.g. eos cp --link /cephfs/user/f/foo /eos/user/f/foo
or /eos/user/f/foo should be visible in /cephfs/user/f/foo
e.g. *eos cp --link /eos/user/f/foo /cephfs/user/f/foo
In any case, it must be a hardlink, because the deletion should not lead to ‘holes’ …
Hi @apeters ,
The FST servers see their data directories like this for example:
bash-5.1$ ls -l /eos-storage/eos-data/eos-fst-0/00000000/
total 10207594
-rw-r-----. 1 daemon daemon 788899369 Dec 17 22:45 00000122
-rw-r-----. 1 daemon daemon 1073741824 Mar 22 07:12 00000162
From the user perspective, the goal is to do something like
bash-5.1$ eos cp --link /eos/wlcg-dev.uvic.ca/data/atlas/test.root ~/MyData/
bash-5.1$ ls -l MyData/
total 0
lrwxrwxrwx. 1 localuser localuser 49 Mar 31 12:52 test.root -> /eos-storage/eos-data/eos-fst-0/00000000/00000162
where 00000162 is the underlying data file of test.root.
The destination could be anywhere, on any filesystem, just as you can use ‘eos cp’ to copy data to some user home dir on NFS, local scratch disk on a compute node, etc.
The end result would be the same as if the user did ‘ln -s source dest’ , it’s just that the user doesn’t know ‘source’, the underlying data file in the FST storage. Only EOS can know this.
I take your point that a hard link would avoid the problem of a link becoming broken, if a file is deleted or migrated to a different FST. However I think we must ensure users mount the FST storage as read-only, meaning they can not create hardlinks in any writeable location on the same filesystem.
Do you think a symbolic (soft) link would be okay?
An alternative approach could be some new command like ‘eos locate’ which only prints out the path of the data file on the FST, e.g.
$ eos locate /eos/wlcg-dev.uvic.ca/data/atlas/test.root
/eos-storage/eos-data/eos-fst-0/00000000/00000162
Then it would be the user’s responsibility and choice to do a hard/soft link, to keep the link up to date, or to just ‘cp’ the file, etc.
I had looked at the eos fileinfo command which seemed close to what we need. Then thanks to @esindril I saw there is a --fullpath option which provides the full FST path.
So for a given source file named $src we can do something like:
Of course it isn’t very robust to rely on grep/awk commands to scrape text output, so it would be nice if the proposed eos command could effectively do the same thing as above.
The only other issue is that the eos fileinfo command seems to expose the details of files to any unauthenticated user, instead of only users who are authorized to read the file. Maybe there is some configuration of our EOS instance to change to ensure this?
Probably using the json format for the fileinfo command and extracting the relevant bits with jq is more practical than grep/awk&friends for programatic access. As long as you can stat the file, you can also issues a fileinfo command. It’s been like this forever and I don’t think changing this behavior will be something that existing user would easily accept.
Ah, thanks again to info from Elvin, I found this slightly nicer command to get the underlying FST path: eos -j fileinfo /eos/wlcg-dev.uvic.ca/data/atlas/atlasscratchdisk/rptaylor/testdownloads/DAOD_EXOT27.23317046._000092.pool.root.1 | jq '.locations[0].fstpath'
However I still suggest that an eos copy command to do more or less the equivalent of