CERN Accelerating science

Directory listing for large entries folder

Hello,

Are there some recommendations about how to handle folders with a very large number of entries ?

It appears that after QuarkDB migration, listing a folder is much slower than before, even when the files are all in the entry cache. Is there room for improvement in this action, or should we just try to limit the number of files by folder (i.e. tell our users to avoid creating too many of them) ? We found some folder with 2M+ entries… :roll_eyes:

In addition, on fusex mounts, listing a folder with many files (it seems that the limit is 32768) gets an “Argument list too long” error. Is there a way to increase this limit, which does not exist in old fuse client, and which never gave us any problem up to now (apart from a understandable long waiting time when listing) ?

Hi Franck,

Do you know roughly how large is the difference when listing on in-memory vs QDB namespace? How much slower?

In general, directories with so many files should be avoided. I’ll try to optimize this use case further, but for the moment I’m afraid there’s no easy fix. Which eos version are you running on the MGM, by the way?

Cheers,
Georgios

If you talk about comparing QuarkDB an InMemory … you mean “eos ls” or via FUSE?

We have put a limitation to 32k children for listings via eosxd. In principle up to 128k it still works well, however it only creates problems when people create that large directories if they are accessed from batch jobs. That said, you can put an infinite amount of children in a directory, but you cannot use opendir/readdir/releasedir on that directory.

I can make the limit easily configurable, it is enforced server side, if you wish. However I think 2M+ is a nightmare and you won’t like the performance penalty you get for that large directories.

Cheers Andreas.

Thank you for your replies.

I was talking about the use of plain eos ls, not fuse which adds overhead.

I’d say that the difference is about 5 to 10 times slower. Before we were expected around 1 second delay when listing (ls without -l) 100K to 200K files, now it is more around 10s.

EG : for a 137K files folder

  • eos ls lasts 29s at first run before being cached
  • eos ls lasts 7s when cached by MGM
  • eos ls -l lasts 28s when cached by MGM
  • ls using eosd client lasts 46s at first run
  • ls using eosd client lasts 1.5s when cached locally
  • ls -l usein eosd client lasts 4 minutes

(I included the -l results just for information, we anyway noticed that it was always better to avoid it when working with many files)

The example of the above mentioned 2.5M folder is 3 minutes to list (eos ls) once cached, I would have expected around 30 seconds with InMemory.

We for sure don’t want to generalize the use of such large folders, and we have always been discouraging the users to do this, but I wanted to report this difference.

About the eosxd limit, if it could be customizable we could see what is a correct value for us (32K as default seems quite reasonable) so that a listing from one user doesn’t disturb the others too much.

The MGM is running version 4.4.23, the latest stable version. Do you think that some newer testing version might already improve this ?

Another observation that might be linked is that the periodic fsck run was around 1 minute with InMemory, and lasts at least 3 minutes now, for the same number of reported files.

Starting from 4.4.27, the MGM uses multiple connections towards QDB for metadata retrieval, and shards the cache to avoid lock contention. It fixed a bottleneck we were seeing in a different use case - it might help in your case too, could be worth a try.

If you’re wondering which version to pick: We’ve been running 4.4.38 in production a long time, and it’s been stable.

I’ll resume my experiments with listing huge directories, maybe there’s some further optimizations I could find to speed up listing.

OK, thank you for these information, Giorgos, we will give it a try.

thanks all for the clarifications. Unfortunately we have quite some users who do not have much clue about reasonable data management, and just dump files as if working on a local SSD… that’s why we can end up in having users creating directories with 2M of files, or creating with a script a file system structure with 20M directories/subdirectories.

We for sure need to try more to “educate” users to follow some guidelines for reasonable data management, we’re not there yet… In addition, as Franck already mentioned, we’ll try newer client versions as Georgios suggested.

Dear all,

I confirm that, after upgrading to v 4.4.39, responses for reading is much improved: only 12 s to list 137K from qdb, 4s from cache, 20 s with -l option, 20 second from eosd, so around 2 to 3 times faster than before.

With this speed, it is now much more usable, and it becomes even more interesting to have the the possibility to increase the 32K limit in eosxd client !