A thread to discuss the correct way to add new nodes on an instance, because we add some issue this morning while adding the first node since a while (and many versions) ago.
While registering the disks (using eos fs add
command), we enabled the node on the MGM (using eos node set ... on
) and the MGM went blocked (hang of every command or access, except eos ns
command).
It took 3 tries to correctly restart it, having stopped the new node, and disabled the balancer. The first 2 restarts also saw the MGM blocked in the same way as first issue.
We didn’t find anything in the log, except this kind of messages at the time of the first block (but not at the 2 successive restarts) concerning the newly added node, but they could as well be a consequence of the block :
191205 09:55:06 time=1575536106.127250 func=open level=NOTE logid=ebe3b860-173c-11ea-8f95-48df374dec7c unit=mgm@s-jrciprjeop214p.cidsn.jrc.it:1094 tid=00007f7f302e4700 source=IProcCommand:66 tident=root.17287:876@s-jrciprjeos010p sec=sss uid=2 gid=2 name=daemon geo="JRC" command not ready, stall the client 5 seconds
We then could add correctly 5 nodes by registering the disks while the eos FST daemon is stopped, activate it, then start the FST daemon, with balancer disabled.
Questions are :
- do you have an idea where this block could come from ?
- could the balancer be struggled when new disks are added because it sees many files to balance, and block the MGM ? If yes, what is the gentle way to activate the balancer ? We had increased the threshold to 70 before adding the node.
- is there a recommended moment to issue the
eos node set ... on
command ?
Version is 4.5.17 on all FSTs of the instance, MGM is 4.5.15.