CERN Accelerating science

Find command and extracting other keys

Hello,

We’re looking at migrating several hundred TB between two clusters, and there’s a fair bit of metadata and extended attributes we’d like to preserve.

The find command via HTTP is incredibly fast, allowing us to pull records on about 150k files per second. I can extract the mtime and ctime easily enough, but we’d like to be able to pull some xattr objects too.

I’m using the command:

curl -H ‘Remote-User:user’ -sL ‘http://mgmhost:8000/proc/user/?mgm.cmd=find&mgm.option=dfMC&mgm.path=/eos/path/path/path/

We have minio_contenttype, and minio_etag as xattrs on all objects, and I’d love if find could return these as part of the find. Is this possible? I can’t seem to work out from the source if it will.

David, you can use fileinfo to get the extra attributes of files in a directory.

The issue here is fileinfo only tells you about the children, so you will need to call this per directory/sub-directory

Michael,

Thanks, we’re aware of this! fileinfo is the method we currently use to get xattrs from files/dirs. The limitations you mentioned are the reasons why we’re using find instead of fileinfo to pull the records.

David is asking if it’s possible to use the find command to extract xattrs in addition to mtime/ctime etc. There doesn’t seem to be a flag that allows this, but if it’s at all possible, this feature would be extremely useful! :slight_smile:

David you want to run this, which does what you want … just need some parsing …

curl -H ‘Remote-User:user’ -sL ‘http://mgmhost:8000/proc/user/?mgm.cmd=find&mgm.option=I&mgm.path=/eos/path/path/path/’

Thanks @apeters, that solves part of the problem for me.

I didn’t realise I (capital eye - I) was an option, I’ll explore that and do a pull request for an update of the CLI help if I’m not beaten to it.

I see the xattrn fields for sys.attr.link, and sys.mtime.propagation which are set on directories, but I’d like to be able to see xattrs on files if possible.

As an example, I have:
EOS Console [root://localhost] |/> attr ls /eos/path…/nfcapd.201811262355
minio_contenttype=“application/octet-stream”
minio_etag=“1c3a64a5d1e6bed6b410a792141bb4f7”

Is it possible to expose these two attrs on a file via find? I tried the -p and -x options without success.

I just push the little fix to print them for files as well.

To use an Australian informal colloquialism, @apeters, you little ripper!

I can confirm that having upgraded a slave mgm to the d5ccba1 commit, the ‘I’ option on a HTTP fetch is exposing all the attribute metdata. Thank you, you’ve saved me a further few tens of millions of HTTP queries!