For ATLAS DDM, storage dumps (a list of files in each RSE/spacetoken) must be written periodically as described here: Sign in to CERN
My first thought is that something basically like eos find -f $datadir | sed "s|^$datadir||"
e.g. in a cron job should work. However I am curious if anyone has a robust field-tested script, and if it will be efficient for ~ millions of files.
I also wonder if there is a way (e.g. maybe eos cat could be extended?) to pipe the stdout of such a command into stdin of an eos command to write a file, without FUSE. That way the data could be streamed directly into the EOS storage.
If there is a recommended approach based on production experience (e.g. what is used at the ATLAS T0 ?) , I could fill in the empty documentation section for EOS: Sign in to CERN
Hi Ryan, you should use:
eos-ns-inspect
That connects directly to QuarkDB and scans the whole namespace or a subpath without involving the MGM at all.
eos-ns-inspect -h
Tool to inspect contents of the QuarkDB-based EOS namespace.
Usage: eos-ns-inspect [OPTIONS] SUBCOMMAND
Options:
-h,–help Print this help message and exit
Subcommands:
dump [DEPRECATED] Recursively dump entire namespace contents under a specific path
scan Recursively scan and print entire namespace contents under a specific path
print Print everything known about a given file, or container
stripediff Find files which have non-nominal number of stripes (replicas)
one-replica-layout Find all files whose layout asks for a single replica
scan-dirs Dump the full list of container metadata across the entire namespace
scan-files Dump the full list of file metadata across the entire namespace