As many are aware, we’ve been working on sharding our storage into Kubernetes orchestrated EOS clusters, and we’ve been using Minio with a custom EOS gateway interface to talk natively to EOS.
By modifying the various xroot client timeouts (thanks @esindril, @luca.mascetti, @gbitzes, @apeters), our infrastructure will deliver S3 bucket listings at about 250k objects per minute, being about 14 times faster than Amazon S3 itself.
However, S3 clients default to a 2 minute idle timeout, meaning if the Minio server doesn’t deliver data to the client for 2 minutes, it assumes failure and disconnects. We’ve worked around this by encouraging the affected clients to increase their client timeout, but not every S3 tool supports this.
I’ve been told newfind is faster, but I can’t see this exposed by the HTTP or xrootd interfaces.
What I’m wondering is if there is a way we can have the MGM start streaming results instead of spooling the find result into /tmp/eos.mgm? From what I can tell, the data format isn’t something that can’t be streamed, and would go a long way to improving the perception of performance for S3 clients.
Hi David,
you can do what ‘newfind’ does on client side, it walks down the hierarchy from client side, so you don’t have to wait long. You can look at the console/command/com_find code …
Otherwise I will probably merge tomorrow the new GRPC interface which streams results as they come.