CERN Accelerating science

Adding new scheduling groups, smaller than exsiting ones

Dear all,

The quick question : what could be the potential issues to have a set of active scheduling groups much smaller than the others ?

This thread is to refresh/extend here a discussion started during the last EOS workshop, to get some more detailed information, and share the situation with other eos community members.

To explain it a bit more, we currently have 48 scheduling groups because we have maximum 48 disks on the storage nodes. We are about to add nodes with 60 disks JBODs, so we would need to add 12 additional groups. However, we have available only 3 of these servers, so that would be only 3 disks per new scheduling group (vs , which seems be a bit few in a replica-2 layout. However, more of this 60 disks nodes will be added later this year, but procurement will take some months.

During the discussion last month, it has been suggested to re-shuffle existing groups by draining disks, and move the to the new group. But this has some downsides for us :

  • the layout of our instance is completely orthogonal : /data01 volumes are all in default.0 groups, and so on, and this is very handy to manage, we would be happy if we can keep that
  • our instance is quite full (85 to 90%, this is why we need to add nodes), so removing disks from existing groups will make them even fuller
  • this takes time, either in terms of resource, but also in terms of delay before we can actually use the disks in the new groups (group balancing might not be performant enough to empty the current) and extend them enough to be better balanced.

In the past, we already extended from 24 disks scheduling groups to 48 disks, however we added them with a larger bunch of disks (between 6 and 10 I’d say)

Did some of you already had this situation ? Do you suggest to add immediately these 12 new groups in the current space, or better wait for the next nodes to be added to enable them at once with more disks (but we would need the space) ? Or still insist in moving around disks ? Or another strategy ?

Another additional question about disk addition : our procedure add disks using the script eosfstregister which allows to add the all the disks of a node with a simple command eosfstregister /data default:60 which places the disks in the correct scheduling groups with a generated uuid.

It is an old too which is still shipped with eos package, but it doesn’t seem to be maintained any more.
Is there any newer tool that some of you are using to add disks ? Or which procedure is the most used ?

Hi Franck,

What about adding the first 48 disks from the new machines to the existing groups and also create 12 more groups but disable them so that no placement happens on them. Once you have more nodes to add and have more disks in the last 12 groups then you can enable them. In this way you release the pressure of space on the first 48 groups (some balancing will be required here) with the tradeoff that you don’t use the full capacity of the newly added nodes until you consider you have enough disks in the last 12 groups.

For the moment eosfstregister is still the tool to use - though indeed it might need a refresh. If not, you can always use the eos fs add command directly.

Cheers,
Elvin

Hi Elvin,

Thank you for your reply.

Yes, we have been thinking of this. If I remember well, when adding first disks to a new subgroup, the group is disabled by default. So we could indeed just keep them disabled until we have an acceptable number of disks. If this is what you recommend to not have groups with only 3 disks, we will go for that.