Questions on the drain system

barbet · November 15, 2018, 2:14pm

Hello,

I am currently draining servers. We are not using RAIN, the filesystems on the servers are partitions of a large RAID6 volume. I have some questions :

When trying to set the parameters for draining, I observed that I could not put more than 10 as for
drainer.node.ntx. Also, I wanted to set drainperiod to 7 days (604800) and it seems it is still 86400.
Is the graceperiod taking over ? What is the respective impact of these 2 timeouts ?
I declare 2 FS in drain but noticed that they where processed one after the other, so no parallelism here.
Is this by design ?
When draining starts, it looks like the (auto) balancer is temporarily suspended. Does it restart
automatically after draining is over or does one have to restart it. If yes how ?
Among the other servers in the cluster, that are pulling data part of the drain procedure, it seems that
some are pulling data in a more efficient manner than others (at least for some time). Can this be
explained ?

Thank you.

JM

davidjericho · November 15, 2018, 11:22pm

Hi @barbet, regarding your first question, if you look at the FS in question, with fs <fsid> status, what does the drain period say?

As an example, for us, if I query a random FS, I get…

# ------------------------------------------------------------------------------------
# FileSystem Variables
# ------------------------------------------------------------------------------------
bootcheck                        := 0
configstatus                     := rw
drainperiod                      := 86400
drainstatus                      := nodrain
graceperiod                      := 86400

If the FS was created when the space had the drainperiod configured, the FS takes the space’s value.

barbet · November 16, 2018, 6:53am

Hi, thank you David,

Yes the drainperiod of the FS is still the old value that was in the space before I change it :

eos -b fs status 3 | grep drain
configstatus := drain
drainperiod := 86400
drainstatus := draining

While in space

eos -b space status default | grep drain
drainer.node.nfs := 5
drainer.node.ntx := 10
drainer.node.rate := 500
drainperiod := 604800

I suppose that what I have to do is change it at the FS level…

JM

apeters · November 16, 2018, 8:03am

You have to use this syntax to change it for all filesystems:

space config default fs.drainperiod=604800

This will apply it to all filesystems and change the default in ‘space status default’.

barbet · November 29, 2018, 7:28am

Hello,

I am still draining servers (it is slow because the servers have 2x1Gbits/s Ethernet adapters). This morning I tried to start all remaining drains at the same time hoping that they would be processed in parallel but nope, I can see already that only the first one is progressing. So, asking the question again: is there a way to have drain operations done in parallel ?

Thanks

JM

amanzi · November 29, 2018, 3:41pm

Hi Jean Michel,
i guess you are using the distributed drain right? ( the new central drain is not enabled by default)
i see that you have max 10 transfers per node for all filesystems, and you said that you cannot raise this limit.
i just tried and i can raise the limit without problems

space config default space.drainer.node.ntx=40
success: setting drainer.node.ntx=40
EOS Console [root://localhost] |/eos/dev01/test/andrea/> space status default

------------------------------------------------------------------------------------

Space Variables

…

balancer := off
balancer.node.ntx := 2
balancer.node.rate := 25
balancer.threshold := 1
converter := off
converter.ntx := 2
drainer.node.nfs := 5
drainer.node.ntx := 40

can you retry to set this var to an higher value?
cheers
Andrea

barbet · November 30, 2018, 6:41am

Thank you Andrea,

I successfully changed the drainer.node.ntx to 40 (I do not understand why I could not in the first place).
But it does not change the way the filesystems are drained, that is: one after the other. The only thing that changed is the number of processes performing eoscp on the target nodes. The network traffic on our EOS cluster can be seen here http://alimonitor.cern.ch?2692 (need a grid certificate allowed by Alice).

Currently 10 filesystems are draining, only the first one is progressing :

eos -b fs ls -d
┌────────────────────────┬────┬──────┬────────────────────────────────┬────────────┬────────────┬────────────┬────────────┬───────────┬──────┬──────┐
│host │port│ id│ path│ drainstatus│ progress│ files│ bytes-left│ timeleft│ retry│ wopen│
└────────────────────────┴────┴──────┴────────────────────────────────┴────────────┴────────────┴────────────┴────────────┴───────────┴──────┴──────┘
nanxrd26.in2p3.fr 1095 13 /data1 draining 45 131.48 K 10.42 TB 519206 0 0
nanxrd26.in2p3.fr 1095 14 /data2 draining 0 239.42 K 18.68 TB 519209 0 0
nanxrd27.in2p3.fr 1095 15 /data1 draining 0 240.79 K 18.61 TB 519212 0 0
nanxrd27.in2p3.fr 1095 16 /data2 draining 0 239.61 K 18.57 TB 519215 0 0
nanxrd28.in2p3.fr 1095 17 /data1 draining 0 238.90 K 18.57 TB 519219 0 0
nanxrd28.in2p3.fr 1095 18 /data2 draining 0 241.09 K 18.51 TB 519222 0 0
nanxrd29.in2p3.fr 1095 19 /data1 draining 0 239.81 K 18.56 TB 519225 0 0
nanxrd29.in2p3.fr 1095 20 /data2 draining 0 239.73 K 18.58 TB 519229 0 0
nanxrd30.in2p3.fr 1095 21 /data1 draining 0 241.00 K 18.59 TB 519232 0 0
nanxrd30.in2p3.fr 1095 22 /data2 draining 0 237.24 K 18.61 TB 519234 0 0

JM

amanzi · November 30, 2018, 10:20am

what about the scheduling groups of the FS under drain? can you paste the fs ls output?

barbet · November 30, 2018, 10:47am

They are all in the same scheduling group default.0

JM

amanzi · November 30, 2018, 11:02am

do you mean that you have only one scheduling group in your system, or that the fs that are draining belongs to the same scheduling group?

in the second case if all your fs under draining belong to the same scheduling group, only FS on the same scheduling group can pull data from them… maybe there are not so many FSs which are not under drain on that scheduling group?

barbet · November 30, 2018, 12:50pm

Andrea,

Yes I have only one scheduling group in the EOS cluster.

JM

amanzi · November 30, 2018, 1:46pm

it’s quite an unusual configuration then…we don’t have any instance with only one scheduling group, i will try to reproduce your issue in my testbed

amanzi · December 3, 2018, 11:23am

i have tried to reproduce this behaviour on my testbed but i could not. I moved all FS on the same scheduling group, ( default.0) and tried to drain 3 FS and i could see them being drained in parallel

EOS Console [root://localhost] |/eos/dev01/test/andrea/> fs ls -d
┌────────────────────────┬────┬──────┬────────────────────────────────┬────────────┬────────────┬────────────┬────────────┬───────────┬──────┬────────────┐
│host │port│ id│ path│ drainstatus│ progress│ files│ bytes-left│ timeleft│ retry│ failed│
└────────────────────────┴────┴──────┴────────────────────────────────┴────────────┴────────────┴────────────┴────────────┴───────────┴──────┴────────────┘
eos-dev02.cern.ch 1095 2 /data/02 draining 77 19 6.53 GB 99999999999 0 0
eos-dev03.cern.ch 1095 6 /data/03 draining 29 102 2.75 GB 99999999999 0 0
eos-dev01.cern.ch 1095 9 /data/03 draining 41 44 14.07 GB 99999999999 0 0

EOS Console [root://localhost] |/eos/dev01/test/andrea/> fs ls -d
┌────────────────────────┬────┬──────┬────────────────────────────────┬────────────┬────────────┬────────────┬────────────┬───────────┬──────┬────────────┐
│host │port│ id│ path│ drainstatus│ progress│ files│ bytes-left│ timeleft│ retry│ failed│
└────────────────────────┴────┴──────┴────────────────────────────────┴────────────┴────────────┴────────────┴────────────┴───────────┴──────┴────────────┘
eos-dev02.cern.ch 1095 2 /data/02 draining 89 9 6.53 GB 99999999999 0 0
eos-dev03.cern.ch 1095 6 /data/03 draining 35 93 2.75 GB 99999999999 0 0
eos-dev01.cern.ch 1095 9 /data/03 draining 46 40 14.07 GB 99999999999 0 0

EOS Console [root://localhost] |/eos/dev01/test/andrea/> fs ls -d
┌────────────────────────┬────┬──────┬────────────────────────────────┬────────────┬────────────┬────────────┬────────────┬───────────┬──────┬────────────┐
│host │port│ id│ path│ drainstatus│ progress│ files│ bytes-left│ timeleft│ retry│ failed│
└────────────────────────┴────┴──────┴────────────────────────────────┴────────────┴────────────┴────────────┴────────────┴───────────┴──────┴────────────┘
eos-dev02.cern.ch 1095 2 /data/02 drained 100 0 0 B 0 0 0
eos-dev03.cern.ch 1095 6 /data/03 drained 100 0 0 B 0 0 0
eos-dev01.cern.ch 1095 9 /data/03 drained 100 0 0 B 0 0 0

i’m using the latest EOS version here… which one are you using?

barbet · December 3, 2018, 12:53pm

Hi Andrea,

Thank you very much for looking at this. We are running Citrine on managers and servers :

 eos -b version
EOS_INSTANCE=eossubatech
EOS_SERVER_VERSION=4.2.25 EOS_SERVER_RELEASE=1
EOS_CLIENT_VERSION=4.2.25 EOS_CLIENT_RELEASE=1

JM

barbet · December 3, 2018, 1:07pm

In fact, looking more cflosely, it is unfair to say that there is no parallelism. Today I can see a feww percent of progress on the other filesystems:

[root@naneosmgr01(EOSMASTER) ~]#eos -b  fs ls -d
┌────────────────────────┬────┬──────┬────────────────────────────────┬────────────┬────────────┬────────────┬────────────┬───────────┬──────┬──────┐
│host                    │port│    id│                            path│ drainstatus│    progress│       files│  bytes-left│   timeleft│ retry│ wopen│
└────────────────────────┴────┴──────┴────────────────────────────────┴────────────┴────────────┴────────────┴────────────┴───────────┴──────┴──────┘
 nanxrd26.in2p3.fr        1095     13                           /data1      drained          100            0          0 B           0      0      0 
 nanxrd26.in2p3.fr        1095     14                           /data2     draining           83      40.65 K      3.48 TB      236647      0      0 
 nanxrd27.in2p3.fr        1095     15                           /data1     draining            5     228.31 K     17.83 TB      236650      0      0 
 nanxrd27.in2p3.fr        1095     16                           /data2     draining            6     227.19 K     17.79 TB      236653      0      0 
 nanxrd28.in2p3.fr        1095     17                           /data1     draining            6     226.48 K     17.77 TB      236657      0      0 
 nanxrd28.in2p3.fr        1095     18                           /data2     draining            5     228.68 K     17.71 TB      236660      0      0 
 nanxrd29.in2p3.fr        1095     19                           /data1     draining            6     227.37 K     17.78 TB      236663      0      0 
 nanxrd29.in2p3.fr        1095     20                           /data2     draining            6     227.29 K     17.80 TB      236667      0      0 
 nanxrd30.in2p3.fr        1095     21                           /data1     draining            6     228.49 K     17.79 TB      236670      0      0 
 nanxrd30.in2p3.fr        1095     22                           /data2     draining            6     224.68 K     17.81 TB      236672      0      0

It was 0% for a long time, it may be that it started when the first filesystem on node nanxrd16 (FS:13) got 100% drained but I can’t confirm.

This drainer.node.ntx value is per destination node, right ?
not for the whole cluster ?

JM

barbet · December 3, 2018, 1:08pm

Typo : please read :

[…] it may be that it started when the first filesystem on node nanxrd26 (FS:13) got 100% drained

amanzi · December 3, 2018, 1:15pm

ok good that it’s working now…but the behavior you had at the beginning was quite weird as the variable drainer.node.ntx is per node

CERN Accelerating science

Questions on the drain system

------------------------------------------------------------------------------------

Space Variables

…