Hello everyone,
I am currently testing write performance in EOS, especially with the idea of using an EOS space as a CTA disk buffer.
In my setup, writing to a normal directory reaches around 2.0–2.6 GB/s.
When I write to a directory using raid5 with nstripes=6, the performance drops to around 800 MB/s.
The two directories are configured as follows:
# Normal directory
attr ls cta_test
sys.forced.blocksize="4M"
sys.forced.checksum="adler"
sys.forced.group="0"
sys.forced.iotype:w="direct"
sys.forced.space="default"
# EC / striped directory
attr ls cta_test_ec/
sys.forced.blocksize="4M"
sys.forced.checksum="adler"
sys.forced.group="0"
sys.forced.iotype:w="direct"
sys.forced.layout="raid5"
sys.forced.nstripes="6"
sys.forced.space="default"
Both directories are using the same EOS space and group:
sys.forced.space="default"
sys.forced.group="0"
The relevant EOS group layout is:
eos group ls
groupview default.0 on 60 filesystems
groupview default.2 on 8 filesystems
And the nodes are:
eos node ls
dseosfst01.gsi.de:1095 online 30 filesystems
dseosfst02.gsi.de:1095 online 30 filesystems
dseosfst05.gsi.de:1095 online 8 filesystems
So the raid5 test is currently using default.0, which contains 60 HDD-based filesystems across two FST nodes.
What I find confusing is that I expected the striped/EC layout to benefit from parallelism across multiple filesystems, but in practice it is much slower than the plain layout.
I also tested raid6, and the performance was very similar to raid5. Therefore, I suspect that the bottleneck is not only the additional parity calculation, but perhaps something related to the EC/RAIN write path, scheduling, client-side writing, network/FST behavior, or the way the stripes are distributed.
My main question is:
Given this setup, what would you consider the most likely cause of the performance drop with raid5/raid6, and would you recommend using such an EOS layout on a 60-HDD group as a CTA disk buffer at all?
More generally, is EOS striping/EC intended to improve write throughput in this kind of use case, or should it mainly be seen as a redundancy/capacity-efficiency feature, while a plain layout would be preferable for a high-throughput CTA disk buffer?
Thanks in advance for any advice.