EOS File Scheduling

apeters · May 18, 2018, 4:35pm

We saw with Andrea the origin of the problem. The scheduler jumps by two per scheduling round, so if the you have an uneven number of groups, you jump on all, if you have an even you skip half of them.
Fix coming!

franck-jrc · May 21, 2018, 8:18am

Ciao Andreas,

OK, I’m glad of that, now I understand the mechanism that leads to our situation!
I am a bit surprised by the fact that you would have only installations with uneven group numbers at CERN. Our group numbers is based on the number of disks by server, and the JBODs we use are all with a even number of disks, so number of groups is naturally 12, 24 or 48 soon.

Have a nice week.

amanzi · May 22, 2018, 8:47am

Hi Frank,
it turned out the issue was really a stupid one, too stupid to see!
the fix is available on our commit repo (http://storage-ci.web.cern.ch/storage-ci/eos/citrine/commit/)
Could you please do some tests in your test instance if you have time?
you are right that also at CERN we have a uneven number of groups and i can see that some production installation have unbalanced groups…but noone reported it so far AFAIK
thanks
cheers
Andrea

franck-jrc · May 22, 2018, 3:59pm

Dear Andrea,

Thank you for this fix.
Yes, I can confirm that when upgrading the test instance’s MGM to the last commit version (4.2.22-20180522100537git0159fb6) and with 8 scheduling groups, the scheduler now correctly spreads the new files on all groups!

Cheers !

Franck

CERN Accelerating science

EOS File Scheduling