"eos fusex ls" show locked:opendir and eosxd on the client consumes cpu

Dear EOS users and developers,

EOS client & server 4.8.102

sometimes, some of our EOS fusex clients are eating cpu & RAM.

on the MGM, the command eos -b fusex ls show locked:opendir, like this :

client : eosxd                     lyoui2.in2p3.fr 4.8.102  online   Tue, 02 May 2023 14:41:03 GMT 2.05 0.25 5ceafa66-e8f7-11ed-a839-008cfafbfc72 p=43736 caps=0 fds=0 static [locked:opendir] >5m mount=/eos

The eos -b fusex ls -l command on this client show :

client : eosxd                     lyoui2.in2p3.fr 4.8.102  online   Tue, 02 May 2023 14:41:03 GMT 6.03 0.31 5ceafa66-e8f7-11ed-a839-008cfafbfc72 p=43736 caps=0 fds=0 static [locked:opendir] >5m mount=/eos 
......   ino          : 104242
......   ino-to-del   : 0
......   ino-backlog  : 0
......   ino-ever     : 154141
......   ino-ever-del : 1
......   threads      : 62
......   total-ram    : 148.276 GB
......   free-ram     : 7.295 GB
......   vsize        : 135.550 GB
......   rsize        : 67.719 GB
......   wr-buf-mb    : 0 MB
......   ra-buf-mb     :0 MB
......   load1        : 26.00
......   leasetime    : 300 s
......   open-files   : 0
......   logfile-size : 4074278251
......   rbytes       : 0
......   wbytes       : 1048576158
......   n-op         : 302798
......   rd60         : 0.00 MB/s
......   wr60         : 0.00 MB/s
......   iops60       : 0.00 
......   xoff         : 0
......   ra-xoff      : 0
......   ra-nobuf     : 0
......   wr-nobuf     : 0
......   idle         : 1126
......   blockedms    : 74596504.00 [opendir]

How to unlock this client (without to have to unmount & remount the /eos on the client side) ?
How to avoid or recover this kind of situation ?
Thanks
Denis

Normally a blocked operation is not locking up a client. It just indicates, that a single operation is not finishing during a timeout. In case of opendir this might be a listing of an extremely large directory … which would also fit the memory increase.

In any case, you should find something in the eosxd logfile /var/log/eos/fusex/… about this operation.

It should also print the inode. Maybe you can check then on the MGM, what is the matter with that directory. If it has some weird content or it is just large etc …

Cheers Andreas.

Thanks Andreas for your suggestion,

Looking for lock and opendir keywords in /var/log/eos/fusex logs on the client side, I saw something like this without being able to track the problem:

230502 18:15:04 t=1683044104.797905 f=Monitor          l=DEBUG tid=00007f66bdbff700 s=Track:194                trylock caller=opendir self=44238 in=8863 exclusive=0
230502 18:15:04 t=1683044104.797912 f=Monitor          l=DEBUG tid=00007f66bdbff700 s=Track:203                locked  caller=opendir self=44238 in=8863 exclusive=0 obj=7f66aa638390
230502 18:15:04 t=1683044104.855868 f=~Monitor         l=DEBUG tid=00007f66bdbff700 s=Track:216                unlock  caller=opendir self=44238 in=8863 exclusive=0
230502 18:15:04 t=1683044104.855879 f=~Monitor         l=DEBUG tid=00007f66bdbff700 s=Track:228                unlocked  caller=opendir self=44238 in=8863 exclusive=0

In all calls, it seems to always have the unlocked part, may be I’m search for the wrong expression
Cheers,
Denis