Uniform scheduling files writing to FSTs

I have an EOS instance used to store the files from a DAQ system.

In the instance, there are 20 FST with 2 FS (2 groups) / FST (eos-server-4.4.10-1 + xrootd-server-4.8.4-1)

The main workflow is writing files continously (raw data) at the maximum speed

I’m trying to find the way to distribute network load to the FST more uniformly than what I see.

I did tests with 24 to 128 flows (from 4 DAQ hosts) writing 3GB files to my instance (1 replica, no raid6/raiddp).

At the end of the tests, in result, the files (number ~10k) are well distributed to FST, but when I take a look at the wopen column when I monitor the output of the commands ‘eos fs ls --io’ or ‘eos node ls --io’, I see that the ‘wopen’ column for nodes or fs from ‘0’ to ‘x’.

Some examples :

Group blancing (well balanced) :

[root@np02eos1 ~]# eos group ls --io
┌────────────────┬──────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────┬──────┬────────────┬────────────┬────────────┬───────────┬──────────┬──────────┐
│name            │  diskload│  diskr-MB/s│  diskw-MB/s│ eth-MiB/s│  ethi-MiB│  etho-MiB│ ropen│ wopen│  used-bytes│   max-bytes│  used-files│  max-files│   bal-shd│ drain-shd│
└────────────────┴──────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────┴──────┴────────────┴────────────┴────────────┴───────────┴──────────┴──────────┘
 default.1              0.00            0            0      22648          0          0      0     32     35.18 TB    438.66 TB      10.67 K     42.84 G          0          0 
 default.2              0.00            0            0      22648          0          0      0     30     35.16 TB    438.66 TB      10.66 K     42.84 G          0          0 

Files balanced to FST nodes (not well balanced : 1 to 9 concurrent write / node) :

[root@np02eos1 ~]# eos node ls --io
┌────────────────────────────────┬────────────────┬──────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────┬──────┬────────────┬────────────┬────────────┬───────────┬──────────┬──────────┬──────────┬──────┬─────────┐
│hostport                        │          geotag│  diskload│  diskr-MB/s│  diskw-MB/s│ eth-MiB/s│  ethi-MiB│  etho-MiB│ ropen│ wopen│  used-bytes│   max-bytes│  used-files│  max-files│   bal-shd│ drain-shd│  gw-queue│  iops│       bw│
└────────────────────────────────┴────────────────┴──────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────┴──────┴────────────┴────────────┴────────────┴───────────┴──────────┴──────────┴──────────┴──────┴─────────┘
 np02ss00.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      1      3.15 TB     46.18 TB          955      4.51 G          0          0          0    118    647 MB 
 np02ss01.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      1      3.89 TB     46.18 TB       1.18 K      4.51 G          0          0          0     76    696 MB 
 np02ss02.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      2      3.95 TB     46.18 TB       1.20 K      4.51 G          0          0          0     92    668 MB 
 np02ss03.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      4      3.72 TB     46.18 TB       1.13 K      4.51 G          0          0          0     98    314 MB 
 np02ss04.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      3      3.89 TB     46.18 TB       1.18 K      4.51 G          0          0          0    114    458 MB 
 np02ss05.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      2      3.87 TB     46.18 TB       1.17 K      4.51 G          0          0          0    110    394 MB 
 np02ss06.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      5      3.83 TB     46.18 TB       1.16 K      4.51 G          0          0          0    118    382 MB 
 np02ss07.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      3      3.94 TB     46.18 TB       1.20 K      4.51 G          0          0          0    119   1109 MB 
 np02ss08.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      2      3.67 TB     46.18 TB       1.11 K      4.51 G          0          0          0    109    379 MB 
 np02ss09.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      2      3.86 TB     46.18 TB       1.17 K      4.51 G          0          0          0     57    313 MB 
 np02ss10.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      2      3.89 TB     46.18 TB       1.18 K      4.51 G          0          0          0    116    392 MB 
 np02ss11.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      3      3.72 TB     46.18 TB       1.13 K      4.51 G          0          0          0    110    482 MB 
 np02ss12.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      9      4.08 TB     46.18 TB       1.24 K      4.51 G          0          0          0    120    432 MB 
 np02ss13.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      4      3.80 TB     46.18 TB       1.15 K      4.51 G          0          0          0    114    451 MB 
 np02ss14.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      3      3.88 TB     46.18 TB       1.18 K      4.51 G          0          0          0    118    412 MB 
 np02ss15.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      4      3.13 TB     46.18 TB          950      4.51 G          0          0          0    110    377 MB 
 np02ss16.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      4      3.90 TB     46.18 TB       1.18 K      4.51 G          0          0          0    107    390 MB 
 np02ss17.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      5      3.80 TB     46.18 TB       1.15 K      4.51 G          0          0          0    113    370 MB 
 np02ss18.cern.ch:1095                    np02-daq       0.00            0            0       1192          0          0      0      4      3.74 TB     46.18 TB       1.14 K      4.51 G          0          0          0     85    538 MB 

Files balanced to FS (not well balanced : 0 to 6concurrent write / FS) :

[root@np02eos1 ~]# eos fs ls --io
┌────────────────────────────────┬──────┬────────────────┬────────────────┬──────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────┬──────┬────────────┬────────────┬────────────┬───────────┬──────────┬──────────────┬────────────┬──────┬─────────┐
│hostport                        │    id│      schedgroup│          geotag│  diskload│  diskr-MB/s│  diskw-MB/s│ eth-MiB/s│  ethi-MiB│  etho-MiB│ ropen│ wopen│  used-bytes│   max-bytes│  used-files│  max-files│   bal-shd│     drain-shd│   drainpull│  iops│       bw│
└────────────────────────────────┴──────┴────────────────┴────────────────┴──────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────┴──────┴────────────┴────────────┴────────────┴───────────┴──────────┴──────────────┴────────────┴──────┴─────────┘
 np02ss00.cern.ch:1095               152        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      0      1.38 TB     23.09 TB          415      2.25 G          0              0          off     60    270 MB 
 np02ss00.cern.ch:1095               153        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      0      1.51 TB     23.09 TB          458      2.25 G          0              0          off     58    377 MB 
 np02ss01.cern.ch:1095               154        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      0      1.68 TB     23.09 TB          509      2.25 G          0              0          off     19    249 MB 
 np02ss01.cern.ch:1095               155        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      0      1.80 TB     23.09 TB          545      2.25 G          0              0          off     57    447 MB 
 np02ss02.cern.ch:1095               156        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      1      1.70 TB     23.09 TB          514      2.25 G          0              0          off     56    419 MB 
 np02ss02.cern.ch:1095               157        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      0      1.69 TB     23.09 TB          511      2.25 G          0              0          off     36    249 MB 
 np02ss03.cern.ch:1095               158        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      1      1.61 TB     23.09 TB          486      2.25 G          0              0          off     53    120 MB 
 np02ss03.cern.ch:1095               159        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.67 TB     23.09 TB          506      2.25 G          0              0          off     45    194 MB 
 np02ss04.cern.ch:1095               160        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      0      1.69 TB     23.09 TB          510      2.25 G          0              0          off     59    288 MB 
 np02ss04.cern.ch:1095               161        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.70 TB     23.09 TB          514      2.25 G          0              0          off     55    170 MB 
 np02ss05.cern.ch:1095               162        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      4      1.70 TB     23.09 TB          514      2.25 G          0              0          off     58    243 MB 
 np02ss05.cern.ch:1095               163        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.67 TB     23.09 TB          506      2.25 G          0              0          off     52    151 MB 
 np02ss06.cern.ch:1095               164        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.68 TB     23.09 TB          509      2.25 G          0              0          off     57    186 MB 
 np02ss06.cern.ch:1095               165        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.68 TB     23.09 TB          507      2.25 G          0              0          off     61    196 MB 
 np02ss07.cern.ch:1095               166        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.72 TB     23.09 TB          520      2.25 G          0              0          off     65    945 MB 
 np02ss07.cern.ch:1095               167        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      3      1.71 TB     23.09 TB          518      2.25 G          0              0          off     54    164 MB 
 np02ss08.cern.ch:1095               168        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.71 TB     23.09 TB          517      2.25 G          0              0          off     54    178 MB 
 np02ss08.cern.ch:1095               169        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.63 TB     23.09 TB          494      2.25 G          0              0          off     55    201 MB 
 np02ss09.cern.ch:1095               170        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      4      1.78 TB     23.09 TB          540      2.25 G          0              0          off     20    134 MB 
 np02ss09.cern.ch:1095               171        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      3      1.72 TB     23.09 TB          522      2.25 G          0              0          off     37    179 MB 
 np02ss10.cern.ch:1095               172        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      5      1.73 TB     23.09 TB          522      2.25 G          0              0                  57    157 MB 
 np02ss10.cern.ch:1095               173        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.71 TB     23.09 TB          518      2.25 G          0              0                  59    235 MB 
 np02ss11.cern.ch:1095               174        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.70 TB     23.09 TB          514      2.25 G          0              0                  58    326 MB 
 np02ss11.cern.ch:1095               175        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      3      1.65 TB     23.09 TB          500      2.25 G          0              0                  52    156 MB 
 np02ss12.cern.ch:1095               176        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.80 TB     23.09 TB          545      2.25 G          0              0                  59    157 MB 
 np02ss12.cern.ch:1095               177        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      3      1.82 TB     23.09 TB          551      2.25 G          0              0                  61    275 MB 
 np02ss13.cern.ch:1095               178        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      5      1.66 TB     23.09 TB          503      2.25 G          0              0                  55    182 MB 
 np02ss13.cern.ch:1095               179        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      3      1.70 TB     23.09 TB          515      2.25 G          0              0                  59    269 MB 
 np02ss14.cern.ch:1095               180        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      1      1.73 TB     23.09 TB          523      2.25 G          0              0                  60    210 MB 
 np02ss14.cern.ch:1095               181        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      2      1.72 TB     23.09 TB          519      2.25 G          0              0                  58    202 MB 
 np02ss15.cern.ch:1095               182        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      1      1.37 TB     23.09 TB          414      2.25 G          0              0                  57    179 MB 
 np02ss15.cern.ch:1095               183        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      1      1.40 TB     23.09 TB          424      2.25 G          0              0                  53    198 MB 
 np02ss16.cern.ch:1095               184        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      4      1.69 TB     23.09 TB          511      2.25 G          0              0                  56    195 MB 
 np02ss16.cern.ch:1095               185        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      0      1.75 TB     23.09 TB          529      2.25 G          0              0                  51    195 MB 
 np02ss17.cern.ch:1095               186        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      3      1.79 TB     23.09 TB          540      2.25 G          0              0                  55    199 MB 
 np02ss17.cern.ch:1095               187        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      6      1.70 TB     23.09 TB          516      2.25 G          0              0                  58    171 MB 
 np02ss18.cern.ch:1095               188        default.1         np02-daq       0.00         0.00         0.00       1192          0          0      0      0      1.73 TB     23.09 TB          523      2.25 G          0              0                  56    293 MB 
 np02ss18.cern.ch:1095               189        default.2         np02-daq       0.00         0.00         0.00       1192          0          0      0      4      1.58 TB     23.09 TB          477      2.25 G          0              0                  29    245 MB 

Here are the geosched parameters :

[root@np02eos1 ~]# eos geosched show param
### GeoTreeEngine parameters :
skipSaturatedPlct = 1
skipSaturatedAccess = 1
skipSaturatedDrnAccess = 1
skipSaturatedBlcAccess = 1
skipSaturatedDrnPlct = 0
skipSaturatedBlcPlct = 0
proxyCloseToFs = 1
penaltyUpdateRate = 1
plctDlScorePenalty = 10(default) | 10(1Gbps) | 10(10Gbps) | 10(100Gbps) | 10(1000Gbps)
plctUlScorePenalty = 10(defaUlt) | 10(1Gbps) | 10(10Gbps) | 10(100Gbps) | 10(1000Gbps)
accessDlScorePenalty = 10(default) | 10(1Gbps) | 10(10Gbps) | 10(100Gbps) | 10(1000Gbps)
accessUlScorePenalty = 10(defaUlt) | 10(1Gbps) | 10(10Gbps) | 10(100Gbps) | 10(1000Gbps)
fillRatioLimit = 80
fillRatioCompTol = 100
saturationThres = 10
timeFrameDurationMs = 1000
### GeoTreeEngine list of groups :
default.1 , default.2 , 

Whow to reduce the wopen standard deviation as low as possible ?

Hi Denis
i think for your use case ( given also that you have 1 replica layout) it’s better to create more scheduling groups in order to have a round robin scheduling between them and uniformly distribute the streams.
For instance you have 20 FSTs, you can can create 10 scheduling groups and assign 4 FSs to each of them ( the 2 FSs of 2 FTSs). I see that you are using eos 4.4.10 so you can use the command fs mv --force which let you change the scheduling group of a FS without drain it
let me know
cheers
Andrea

Hi andrea,

Thanks for your suggestion.

I tried some configurations :

  1. from my original configuration : see the plot, blue, “2-groups-20-fs”,
  2. also the configuration you suggested (10 scheduling groups with 4 FSs (2FS from 2FSTs) to each of them) : see the plot, green, “10-groups-2-fs-2fst”),
  3. and another one (20 scheduling groups with 2 FSs (1FS from 2FSTs) to each of them) : see the plot, red, “20-groups-1-fs-2fst”),

In this test, the files are written from 4 hosts with 6 xrdcp in parallel from each host.

plot-shed-groups

It’s a little better, I expected a better improvement.

Hi Denis,
what are the optimal performance you would expect based on your hardware and network configuration?
cheers
Andrea

Hello Andrea, thanks for your message,

Well, humm … need 16GB/s.

The preliminary tests I’ve run 2 years ago with native Xrootd (4.3.0-1)and EOS (citrine 4.0.12) shows me that I had better performance with native Xrootd. The problem is not here.

But I see there that the problems I have seems not to come first from the file balancing to the FSTs, but from the number of clients which write the files (or the MD server ?) :

Here are my observations :

Writing files from 1 client to EOS (independant tests with 6, 8, 16, 20 or 32 xrdcp in //). Whatever client I use, I can reach and maintain 9GB/s for a long time (each client has 2*40Gb/s ethernet card) :
plot-1-client

Writing files from 2 clients at the same time to EOS (independant tests with 6, 8, 16, 20 or 32 xrdcp in //) :
plot-2-clients

Writing files from 3 clients at the same time to EOS (independant tests with 6, 8, 16, 20 or 32 xrdcp in //) :
plot-3-clients

I’m currently searching for a buffers/interrupts… restriction at the metadata level. May be ?
Note that we can write small (4kB) files at a 350 Hz rate.
have a nice Weekend.
Denis