Streaming output of MGM query results

Hello all,

As many are aware, we’ve been working on sharding our storage into Kubernetes orchestrated EOS clusters, and we’ve been using Minio with a custom EOS gateway interface to talk natively to EOS.

For those that want to look more, our gateway code can be obtained from https://github.com/AARNet/minio/tree/shard/cmd/gateway/eos

By modifying the various xroot client timeouts (thanks @esindril, @luca.mascetti, @gbitzes, @apeters), our infrastructure will deliver S3 bucket listings at about 250k objects per minute, being about 14 times faster than Amazon S3 itself.

However, S3 clients default to a 2 minute idle timeout, meaning if the Minio server doesn’t deliver data to the client for 2 minutes, it assumes failure and disconnects. We’ve worked around this by encouraging the affected clients to increase their client timeout, but not every S3 tool supports this.

I’ve been told newfind is faster, but I can’t see this exposed by the HTTP or xrootd interfaces.

What I’m wondering is if there is a way we can have the MGM start streaming results instead of spooling the find result into /tmp/eos.mgm? From what I can tell, the data format isn’t something that can’t be streamed, and would go a long way to improving the perception of performance for S3 clients.

Hi David,
you can do what ‘newfind’ does on client side, it walks down the hierarchy from client side, so you don’t have to wait long. You can look at the console/command/com_find code …

Otherwise I will probably merge tomorrow the new GRPC interface which streams results as they come.

Thank you @apeters, unfortunately newfind doesn’t support the -I switch, which contains information I’m after.

The GRPC approach sounds like a great approach. I’ll definitely give it a good test when it’s in a release version!

Just to mention,
EOS 4.5.6 has now the full support for namespace operations via GRPC.

The NS GRPC functions allow ‘on-behalf’ operation (sudo like), if the GRPC account is a SUDOer.

The streaming ‘find’ operation has a complete set of filters, which allow you to select more or less on all available meta data fields.

There is a proof-of-concept C++ CLI for ns and find and stat operations. Checkout eos/client/

and the CLI:

eos-grpc-find [--key <ssl-key-file> --cert <ssl-cert-file> --ca <ca-cert-file>] [--endpoint <host:port>] [--token <auth-token>] [--depth <depth>] [--select <filter-string>] [-f | -d] <path>
 <filter-string> is setup as "key1:val1,key2:val2,key3:val3 ... where keyN:valN is one of 
                    owner-root:1|0
                    group-root:1|0
                    owner:<uid>
                    group:<gid>
                    regex-filename:<regex>
                    regex-dirname:<regex>
            zero-size:1|0
            min-size:<min>
            max-size:<max>
            min-children:<min>
            max-children:<max>
            zero-children:1|0
            min-locations:<min>
            max-locations:<max>
            zero-locations:1|0
            min-unlinked_locations:<min>
            max-unlinked_locations:<max
            zero-unlinked_locations:1|0
            min-treesize:<min>
            max-treesize:<max>
            zero-treesize:1|0
            min-ctime:<unixtst>
            max-ctime:<unixtst>
            zero-ctime:1|0
            min-mtime:<unixtst>
            max-mtime:<unixtst>
            zero-mtime:1|0
            min-stime:<unixtst>
            max-stime:<unixtst>
            zero-stime:1|0
            layoutid:<layoudid>
            flags:<flags>
            symlink:1|0
            checksum-type:<cksname>
            checksum-value:<cksvalue>
            xattr:<key>=<val>

    eos-grpc-ns  -h 
    usage: eos-grpc-ns [--key <ssl-key-file> --cert <ssl-cert-file> --ca <ca-cert-file>] [--endpoint <host:port>] [--token <auth-token>] [--uid] [--gid] [--norecycle] [-r] [--target <target>] -p <path> <command>

    <command> is one of: mkdir,rmdir,touch,unlink,rm,rename, symlink,setxattr,chown,chmod

eos-grpc-md .... <path>
1 Like

That’s brilliant @apeters!