How to temporarily stop a file server (FST) for maintenance

For those interested, today had to apply firmware updates to an EOS file server (FST role) and a reboot was required. There may be similar cases for other kind of maintenance operations. I asked Andreas about how to do it properly and here are the steps:

  1. Put the node in read-only : EOS Console [root://localhost] |/> node config mynode.fqdn:1095 configstatus=ro
  2. Wait until all write operations have stopped monitoring with : node ls --io mynode.fqdn
  3. Then stop the eos fst service: service eos stop (or similar command with systemctl)
  4. Perform maintenance operations including reboot as required
  5. After server restart and ready, start eos services
  6. Check eos filesystems of the node for booted status: EOS Console [root://localhost] |/> fs ls mynode.fqdn
  7. When all filesystem have booted, put the node in rw mode : EOS Console [root://localhost] |/> node config mynode.fqdn:1095 configstatus=rw

Hi @barbet,

to make the procedure completely transparent also for reading clients (avoid a read error+retry)
one need to avoid as well to shut-down an FST while there are reads ongoing.

In order to avoid scheduling new transfers to the node in addition to the previous steps
You need to first set the node off and wait that all the load (both read and write) goes away:
eos node set mynode.fqdn:1095 off

and once the node has been updated to set it back to on (and read-write)
eos node set mynode.fqdn:1095 on

we are using this in EOSUSER/CERNBox to make this procedure completely
transparent for our clients.

Cheers,
Luca

On our instance, files are sometimes open for many hours (long process), other time some close are missing and they stay open forever, so waiting for ropen or wopen to go to 0 is often infinite, and we have to cut them down.

Is there a way to list which files are kept open, and by whom on one FS or FST so that we can decide if it is critical to shut down the FST anyway, or notify the users ? I can only see the hotfiles information on one FS, but they are not all of them.