As many are aware, we’ve been working on sharding our storage into Kubernetes orchestrated EOS clusters, and we’ve been using Minio with a custom EOS gateway interface to talk natively to EOS.
For those that want to look more, our gateway code can be obtained from https://github.com/AARNet/minio/tree/shard/cmd/gateway/eos
By modifying the various xroot client timeouts (thanks @esindril, @luca.mascetti, @gbitzes, @apeters), our infrastructure will deliver S3 bucket listings at about 250k objects per minute, being about 14 times faster than Amazon S3 itself.
However, S3 clients default to a 2 minute idle timeout, meaning if the Minio server doesn’t deliver data to the client for 2 minutes, it assumes failure and disconnects. We’ve worked around this by encouraging the affected clients to increase their client timeout, but not every S3 tool supports this.
I’ve been told newfind is faster, but I can’t see this exposed by the HTTP or xrootd interfaces.
What I’m wondering is if there is a way we can have the MGM start streaming results instead of spooling the find result into /tmp/eos.mgm? From what I can tell, the data format isn’t something that can’t be streamed, and would go a long way to improving the perception of performance for S3 clients.