What is the best way to monitor balancing, draining, and LRU activity? I am draining a bunch of FSTs and as a result also seeing quite a bit of balancing, but the logs are empty. LRU activity is usually logged when I have the LRU enabled, but not a ton of info in there.
It might be interesting to be able to have a view into what is happening in these activities. I am currently running into issues where I believe large, single replica files are causing draining to be extremely slow because they are timing out before being completely copied off and another copy to a different FST is started. After a while, these build up and cause the drain to perform very slowly.
It is hard to confirm this however, because I don’t know of anyway to get, for example, a list of what files are being moved by active threads.
Any advice anyone might have on getting a clearer window into these processes would be much appreciated. I’m looking for something simpler than going in with lsof and trying to correlate all that output to something meaningful.
–
Dan Szkola
FNAL