Correct way to add new FSTs nodes

Hello,

I’m reviving this thread because we had a very similar issue again in the last days.

Adding some disk from a new host, the instance went unavailable, with the famous message command not ready, stall the client 5 seconds, nothing is possible except eos ns commands.

This happened twice in a row. The first FST added 5 disks (5 seconds delay between each eos fs add command) before this happened. We realized that the balancer was active at that time, and thought this was the root cause.

Instance was restarted (with troubles, as in this thread we stopped many fusex clients to avoid the instance being in blocked state just after restart).

But the day after, another FST, this time with balancer disabled, created the same issue. We also needed to restart the MGM twice, but no need to switch off the fusex clients.

By reading back this thread, we see that we managed last year by adding the disks while the FST daemon is shut down. Could this be a safety solution to avoid bringing the instance down again ? We have several FSTs to be added in the next days.
During the last year, we added some nodes successfully without the need to shut FST down.

Or could you see any other explanation for this incident ? It seems that there is some deadlock, but the logs do not help us at all to understand which components go in conflict. It seems that balancer is not involved, since we also got the issue while it was disabled.

Can someone explain the procedure generally used to add node & disks to one instance.

Unfortunately, we still couldn’t plan the upgrade of the instance, so we are still stuck with versions 4.5.15/4.5.17.