Hello Everyone,
Todays i.e. 10/07/2021, Our eos has stop working. We had check the mgm and fsts and found that the /var is filled up in all 8 FSTs. The last log file i.e. Xrdlog.fst which has mostly occupied /var.
So, I had remove last file of xrdlog.fst and restart eos@fst daemon from all 8 FSTs. Then EOS daemon was running fine in all FSTs.
Now, we again restart EOS daemon i.e. eos (eos@mq and eos@mgm) in Master and Slave machines, but its not started. The error shown during starting of EOS daemon is below:–
=========================
[root@eos-mgm ~]# systemctl status eos
● eos.service - EOS All Services
Loaded: loaded (/usr/lib/systemd/system/eos.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2021-07-11 01:21:09 IST; 2min 23s ago
Process: 5196 ExecStartPre=/bin/sh -c /usr/sbin/eos_start_pre.sh eos-all (code=exited, status=1/FAILURE)
Jul 11 01:19:39 eos-mgm.tier2-kol.res.in systemd[1]: Starting EOS All Services…
Jul 11 01:19:39 eos-mgm.tier2-kol.res.in sh[5196]: Waiting for 5202 …
Jul 11 01:21:09 eos-mgm.tier2-kol.res.in sh[5196]: Job for eos@mgm.service failed because a timeout was exceeded. See "systemctl status eos@mgm.servi…details.
Jul 11 01:21:09 eos-mgm.tier2-kol.res.in sh[5196]: Job for eos@mq.service failed because a timeout was exceeded. See "systemctl status eos@mq.service…details.
Jul 11 01:21:09 eos-mgm.tier2-kol.res.in sh[5196]: Waiting for 5203 …
Jul 11 01:21:09 eos-mgm.tier2-kol.res.in systemd[1]: eos.service: control process exited, code=exited status=1
Jul 11 01:21:09 eos-mgm.tier2-kol.res.in systemd[1]: Failed to start EOS All Services.
Jul 11 01:21:09 eos-mgm.tier2-kol.res.in systemd[1]: Unit eos.service entered failed state.
Jul 11 01:21:09 eos-mgm.tier2-kol.res.in systemd[1]: eos.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@eos-mgm ~]#
=============
However there are no new log generated or write in /var/log/eos/mgm/xrdlog.mgm and /var/log/eos/mq/xrdlog.mq after today 12:06:13 (Time) . Last line of xrdlog.mgm are below:
===================
210710 12:06:14 time=1625898974.257189 func=xrdmgmofs_shutdown level=ALERT logid=static… unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007ff551bc0780 source=Shutdown:59 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“shutdown complete”
====================
log inside /var/log/messages are shown as:-
11 01:57:51 eos-mgm systemd: Stopped EOS mgm.
Jul 11 01:57:51 eos-mgm systemd: Starting EOS mgm…
Jul 11 01:57:51 eos-mgm systemd: Stopped EOS mq.
Jul 11 01:57:51 eos-mgm systemd: Starting EOS mq…
Jul 11 01:59:21 eos-mgm systemd: eos@mq.service start-pre operation timed out. Terminating.
Jul 11 01:59:21 eos-mgm systemd: eos@mgm.service start-pre operation timed out. Terminating.
Jul 11 01:59:21 eos-mgm systemd: Failed to start EOS mgm.
Jul 11 01:59:21 eos-mgm systemd: Unit eos@mgm.service entered failed state.
Jul 11 01:59:21 eos-mgm systemd: eos@mgm.service failed.
Jul 11 01:59:21 eos-mgm systemd: Failed to start EOS mq.
Jul 11 01:59:21 eos-mgm systemd: Unit eos@mq.service entered failed state.
Jul 11 01:59:21 eos-mgm systemd: eos@mq.service failed.
Jul 11 01:59:26 eos-mgm systemd: eos@mq.service holdoff time over, scheduling restart.
Jul 11 01:59:26 eos-mgm systemd: eos@mgm.service holdoff time over, scheduling restart.
Jul 11 01:59:26 eos-mgm systemd: Stopped EOS mgm.
Jul 11 01:59:26 eos-mgm systemd: Starting EOS mgm…
Jul 11 01:59:26 eos-mgm systemd: Stopped EOS mq.
Jul 11 01:59:26 eos-mgm systemd: Starting EOS mq…
Any hints or suggestions on what may be causing this?
Regards
Prasun