Monitoring balancing, draining, and LRU activity

What is the best way to monitor balancing, draining, and LRU activity? I am draining a bunch of FSTs and as a result also seeing quite a bit of balancing, but the logs are empty. LRU activity is usually logged when I have the LRU enabled, but not a ton of info in there.

It might be interesting to be able to have a view into what is happening in these activities. I am currently running into issues where I believe large, single replica files are causing draining to be extremely slow because they are timing out before being completely copied off and another copy to a different FST is started. After a while, these build up and cause the drain to perform very slowly.

It is hard to confirm this however, because I don’t know of anyway to get, for example, a list of what files are being moved by active threads.

Any advice anyone might have on getting a clearer window into these processes would be much appreciated. I’m looking for something simpler than going in with lsof and trying to correlate all that output to something meaningful.


Dan Szkola
FNAL