We have a quite serious issue, concerning newly created folders that can’t be seen from any fusex client. They are seen as zero size files with always the same date and uid/gid pair.
We might have hit a threshold in the number of folders created… Current container id is 269146625.
We could see that when running eos file info on these new folders, their id is like fxid:10081bf3, i.e. above 10000000. Could this be linked ? Is this a known bug ? If yes, would a version upgrade solve this ? Supposedly the MGM (4.4.39), since even newer clients (4.5.9) have the issue.
Unfortunately, you hit a problem that is agnostic to the type of fuse, namely the max number of container ids that the fuse daemon can see at the moment. We encode it in 28 bits which matches perfectly with you container id.
28 bits can hold at most 268435456 container ids.
Fortunately, Georigos has done some preparation work for this but it’s not in any release. So we’ll prepare a hot fix release for this and you can enable the new functionality with an env variable. Can you please let me know which version of eos you run in your cluster?
Thank you for your quick answer.
We are running MGM 4.4.39. Fusex clients are in the range of 4.4.23 to 4.5.6.
Which component would need an upgrade to enable this hot fix ?
But we can confirm that old fuse/eosd clients can correctly access the folders, or is there some hidden drawback that we didn’t detect yet ?
At the moment no fuse client can properly deal with this, either old or new.
We will do a hot fix for this on the 4.5.y branch so you will need to upgrade everything to this new release (4.5.15).
You need to update the MGM, the FSTs and potentially the fuse clients.
The old clients might work actually but we never tested this thoroughly.
Georgios will push a commit to enable the switch and I will tag 4.5.15 asap.
You need to update the MGM, the FSTs and potentially the fuse clients.
Also the FSTs would need to update ? At the same time than the MGM (i.e. full shutdown of the instance), or could the FSTs be done in a second moment ? Or before ?
Potentially the fuse clients : under which conditions ?
The old clients might work actually but we never tested this thoroughly.
In fact, it really seems that the old eosd clients behave correctly, folders and files can be created from there, and can be read back.
The plan for the inode encoding switchover has been:
The administrator flips the flag in the MGM and all FSTs, all at the same time, and restarts all server processes.
All clients detect the inode change, and crash on purpose. Users can just restart them.
Upon restart, the clients will now begin using the new inode encoding scheme.
Parts 2 & 3 have already been implemented, the inode scheme detection logic is there for many past versions. There’s a good possibility the clients will continue working as-is, they’ll “only” all crash and restart once the inode encoding scheme is switched.
I’m implementing part 1 for you to try out immediately, hang tight.
for a quick work-around to make the EOS usable quickly, would it help if we empty the recycle bin, meaning: would this reduce the number of container ID’s and we have more time to inform users before we apply the patches?
This is in the client part, but if the version is new enough, right ? Can you find back which is the minimum version for this to be already available in the client ?
Unfortunately not - the changeover was not supposed to happen until some time in the future (months), and I’ve wanted to test the implementation a lot more. I believe it’ll work, but can’t ensure it.
If we had known you were close to the inode limit, we’d have rushed it… Having this happen so suddenly is quite unfortunate.
Yes, the env variable needs to be put in /etc/sysconfig/eos_env or /etc/sysconfig/eos, depends which one you use. I assume it’s the first one, since it’s used by systemd and there you don’t need the export.
Yes, you also need to upgrade xrootd and eos-xrootd if you have it installed on your machine.
There is nothing to be done on the QuarkDB side.
There is no problem for the file ids, anyway this change will considerably increase the max id values for both.