Dear all,
The quick question : what could be the potential issues to have a set of active scheduling groups much smaller than the others ?
This thread is to refresh/extend here a discussion started during the last EOS workshop, to get some more detailed information, and share the situation with other eos community members.
To explain it a bit more, we currently have 48 scheduling groups because we have maximum 48 disks on the storage nodes. We are about to add nodes with 60 disks JBODs, so we would need to add 12 additional groups. However, we have available only 3 of these servers, so that would be only 3 disks per new scheduling group (vs , which seems be a bit few in a replica-2 layout. However, more of this 60 disks nodes will be added later this year, but procurement will take some months.
During the discussion last month, it has been suggested to re-shuffle existing groups by draining disks, and move the to the new group. But this has some downsides for us :
- the layout of our instance is completely orthogonal :
/data01
volumes are all in default.0 groups, and so on, and this is very handy to manage, we would be happy if we can keep that - our instance is quite full (85 to 90%, this is why we need to add nodes), so removing disks from existing groups will make them even fuller
- this takes time, either in terms of resource, but also in terms of delay before we can actually use the disks in the new groups (group balancing might not be performant enough to empty the current) and extend them enough to be better balanced.
In the past, we already extended from 24 disks scheduling groups to 48 disks, however we added them with a larger bunch of disks (between 6 and 10 I’d say)
Did some of you already had this situation ? Do you suggest to add immediately these 12 new groups in the current space, or better wait for the next nodes to be added to enable them at once with more disks (but we would need the space) ? Or still insist in moving around disks ? Or another strategy ?
Another additional question about disk addition : our procedure add disks using the script eosfstregister
which allows to add the all the disks of a node with a simple command eosfstregister /data default:60
which places the disks in the correct scheduling groups with a generated uuid.
It is an old too which is still shipped with eos package, but it doesn’t seem to be maintained any more.
Is there any newer tool that some of you are using to add disks ? Or which procedure is the most used ?