CERN Accelerating science

Meaning of nstripes

Hello everyone,

I am working on a test eos system. I have 36 disks across three FSTs (12 disks per FST). I would like to have a raid5 (n+1) setup which should be possible with n=2 since I have three disks per scheduling group. So I set nstripes=2. When I try to copy files to my system I get an error and on the FSTs I see:

210705 15:13:06 time=1625523186.251065 func=Open level=ERROR logid=2c694f58-ddde-11eb-9871-fa163ebb7aaa unit=fst@elephant14.heprc.uvic.ca:1095 tid=00007f5e9d8fc700 source=RainMetaLayout:123 tident=mfens98.233021:84@otter sec=unix uid=0 gid=0 name=mfens98 geo="" msg=“failed open, stripe size must be at least 6” stripe_size=1

This error does not make sense to me since I have nstripes set to 2 not 1.

Please help me to understand the what nstripes actually means and if I am implementing it correctly and if there are other configuration settings I need to change.

Thanks,

Matthew

Continuing to try things it seems like for any RAIN layout (ie not replica or plain) nstripes must be >=6. ‘nstripes’ means the number of filesystems that will be used to store your file and stripe_size==nstripes-parity.

For example with raid5 layout (N+1) setting nstripes=6 means you will have stripe_size==5 (6-1==5) and your file will be split across 5 filesystems plus a parity filesystem. If you use archive (N+3) layout stripe_size==3 (6-3==3) and your file is split across 3 filesystems with 3 parity filesystems.

The confusing part is the error message. If I have a raid5 layout with nstripes==5 I get the error message above except stripe_size=4 but the error goes away when I change to nstripes==6. I feel like this error message is confusing since it is really telling you to set nstripes to at least 6 and not the stripe_size to 6, or the developers want the stripe_size to be at least 6 but only check that nstripes>=6.

I hope this helps anyone else who comes across this and that the developers can help to make this error message less confusing.

Cheers,

Matthew

Hi Matthew,

In order to have a funtioning RAIN setup you need at least 6 file systems in a scheduling group. We have 3 types of RAIN layouts - raiddp (uses simpel XOR) which is 4 data + 2 parity stripes, reeds (Reed-Solomon) which is 2 parity stripes, archive which is 3 parity stripes. The nstripes parameter controls the total number of stripes (data + parity). So for example, if you have reeds with 8 stripes this will mean 6 data stripes + 2 parity stripes; for an archive layout with 10 stipes, you have 7 data and 3 parity stripes. The minimum is 6 stripes as you can not ensure the minimum level of redundancy if you have less. I hope this clarifies things.

Cheers,
Elvin

Hi Elvin,

Thank you for your response. This does clarify things. The one thing that could still be confusing is the error message on the FSTs if nstripes is too small. The error message says stripe size must be at least 6, stripe size=nstripes-parity stripes. At least for me it would make more sense if the error message told you the value of nstripes instead of nstripes-parity stripes.

I do understand the issue now though and how to fix it and thank you again for your response.

Cheers,
Matthew

Hi Matthew,

Thanks for you observation, this was indeed a mistake and it’s now fixed by the following commit:
https://gitlab.cern.ch/dss/eos/-/commit/85eca8d042c72144f8289a5b98637a7827eeffb4

Cheers,
Elvin