The all 16 fs of one fst i.e. eos05.tier2-kol.res.in out of 7 fsts is got RO mode (Kolkata::EOS2 intance))

Hi Experts,

We are facing “Server responded with an error: [3009] Unable to get free physical space /eos/alicekolkata/grid/07/00488/aa6fb000-db59-11ea-ab1b-0242ec98ab37; No space left on device (destination)” error in Monalisa Web. We investigated and found one of the FST i.e. eos05 is continuously going into “ro” mode out of 7 fsts.
Here is a snapshot:

==================

[root@eos-mgm ~]# eos -b fs ls| grep ro
│host │port│ id│ path│ schedgroup│ geotag│ boot│ configstatus│ drain│ active│ health│
eos05.tier2-kol.res.in 1095 2 /xdata0 default.0 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 10 /xdata1 default.1 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 17 /xdata10 default.2 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 24 /xdata11 default.3 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 31 /xdata12 default.4 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 38 /xdata13 default.5 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 45 /xdata14 default.6 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 52 /xdata15 default.7 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 59 /xdata2 default.8 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 66 /xdata3 default.9 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 73 /xdata4 default.10 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 80 /xdata5 default.11 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 87 /xdata6 default.12 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 94 /xdata7 default.13 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 101 /xdata8 default.14 Kolkata::EOS2 booted ro nodrain online N/A
eos05.tier2-kol.res.in 1095 108 /xdata9 default.15 Kolkata::EOS2 booted ro nodrain online N/A
[root@eos-mgm ~]#

===============
We try to change the configstatus from ro to rw manually from manager.

[root@eos-mgm ~]# eos -b node config eos05.tier2-kol.res.in:1095 configstatus=rw

But after few hours, it’s automatically convert to ro mode.Due to these an error like "xrdcp exited with exit code 54: [ERROR] Server responded with an error: [3009] Unable to get free physical space /eos/alicekolkata/grid/07/00488/aa6fb000-db59-11ea-ab1b-0242ec98ab37; No space left on device (destination).
We compared the config file with other fsts, but there is no change, we tried rebooting the fst and then restart the eos services at fst and mgm, it makes the fst into “rw” mode, but after few hours it is again going into “ro” mode.
In the output of “eos -b node ls --io”, we found that the valuw of “bw” and “iops” for eos05 are 0 as compared to others fsts:-
[root@eos-slave ~]# eos -b node ls --io
┌────────────────────────────────┬────────────────┬──────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────┬──────┬────────────┬────────────┬────────────┬───────────┬──────────┬──────────┬──────────┬──────┬─────────┐
│hostport │ geotag│ diskload│ diskr-MB/s│ diskw-MB/s│ eth-MiB/s│ ethi-MiB│ etho-MiB│ ropen│ wopen│ used-bytes│ max-bytes│ used-files│ max-files│ bal-shd│ drain-shd│ gw-queue│ iops│ bw│
└────────────────────────────────┴────────────────┴──────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────┴──────┴────────────┴────────────┴────────────┴───────────┴──────────┴──────────┴──────────┴──────┴─────────┘
eos04.tier2-kol.res.in:1095 Kolkata::EOS2 0.00 0 0 1192 283.036 317.334 318 0 29.40 TB 156.71 TB 1.45 M 15.31 G 0 0 0 1180 3765 MB
eos05.tier2-kol.res.in:1095 Kolkata::EOS2 0.00 0 0 1192 309.188 311.925 317 0 29.72 TB 156.71 TB 1.45 M 15.31 G 0 0 0 0 0 MB
eos06.tier2-kol.res.in:1095 Kolkata::EOS2 0.00 0 0 1192 221.185 303.258 299 0 28.28 TB 156.71 TB 1.44 M 15.31 G 0 0 0 1175 3790 MB
eos07.tier2-kol.res.in:1095 Kolkata::EOS2 0.00 2 0 1192 259.264 327.801 317 0 29.69 TB 156.71 TB 1.45 M 15.31 G 0 0 0 1183 3805 MB
eos08.tier2-kol.res.in:1095 Kolkata::EOS2 0.00 0 0 1192 345.311 305.18 314 0 29.58 TB 156.71 TB 1.45 M 15.31 G 0 0 0 1190 3798 MB
eos09.tier2-kol.res.in:1095 Kolkata::EOS2 0.00 0 0 1192 309.044 317.452 319 0 29.47 TB 156.71 TB 1.45 M 15.31 G 0 0 0 1198 3747 MB
eos10.tier2-kol.res.in:1095 Kolkata::EOS2 0.00 0 0 1192 323.402 308.584 313 0 29.21 TB 156.71 TB 1.45 M 15.31 G 0 0 0 1194 3787 MB
[root@eos-slave ~]#

Attribute and layout of eos intance are below:-
[root@eos-slave ~]# eos -b attr ls /eos/alicekolkata/grid
sys.forced.blockchecksum=“crc32c”
sys.forced.blocksize=“1M”
sys.forced.checksum=“adler”
sys.forced.layout=“raid6”
sys.forced.nstripes=“7”
sys.forced.space=“default”
sys.forced.stripes=“7”
sys.lru.expire.empty="“12h”
[root@eos-slave ~]#

(Our eos version EOS 4.7.7 (2019), EOS instance: Kolkata::EOS2)

Kindly help us to solve this problem.

Regards
Prasun, Kolkata, India

Hi Prasun,

This usually happens when you run out of space on your /var/ partition on the FST node. There is a thread that monitors this and then switched the file systems in ro mode for protection.

Hope it helps,
Elvin

Dear Elvin,

Thank for your suggestion. Now fst is ok, and showing in “rw” mode.

The size of “/var/” partition on the FST i.e. eos05 was filled upto 67% and there was no logrotate of log file xrdlog.fst from 13-July-2020 (i.e. xrdlog.fst-20200713.gz). So, the size of xrdlog.fst was gone upto 97GB. Then we removed this file and again restart eos@fst service in fst. After, now logrotate of log file xrdlog.fst for 14-August-2020 is created (i.e. xrdlog.fst-20200814.gz; size - 4.9M). Now the size of “/var/” partition on the eos05 is filled upto 6%. One thing I would like to know that our “/var” partition was filled upto 67%, not full, still it was going into “ro” mode, what was the reason behind it.
Also can you give some suggestions that if this scenario happens again then how to solve it(i.e logrotate in regular basis).
I have kept it under observation.

Regards
Prasun