FST service is killed because it does not contact the broker

Hi! It seems that my fst service is automatically killed with:

@@@@@@ 00:00:00 op=shutdown msg="shutdown timedout after 0 seconds, signal=1
@@@@@@ 00:00:00 op=shutdown status=forced-complete

i assume because of

200407 13:35:46 time=1586255746.380048 func=RefreshBrokersEndpoints  level=ERROR logid=static.............................. unit=fst@fst01.spacescience.ro:1095 tid=00007f7ee96fe700 source=XrdMqClient:498                tident= sec=(null) uid=99 gid=99 name=- geo="" msg="failed to contact broker" url="root://mgm.spacescience.ro:1094//eos/fst01.spacescience.ro:1095/fst_mq_test?xmqclient.advisory.flushbacklog=0&xmqclient.advisory.query=0&xmqclient.advisory.status=0"
200407 13:35:46 time=1586255746.380098 func=ErrorReport              level=ERROR logid=FstOfsStorage unit=fst@fst01.spacescience.ro:1095 tid=00007f7ee96fe700 source=ErrorReport:92                 tident=<service> sec=      uid=0 gid=0 name= geo="" cannot send errorreport broadcast

200407 13:35:49 time=1586255749.636339 func=RefreshBrokersEndpoints  level=ERROR logid=static.............................. unit=fst@fst01.spacescience.ro:1095 tid=00007f7ee37ff700 source=XrdMqClient:498                tident= sec=(null) uid=99 gid=99 name=- geo="" msg="failed to contact broker" url="root://mgm.spacescience.ro:1094//eos/fst01.spacescience.ro:1095/fst_mq_test?xmqclient.advisory.flushbacklog=0&xmqclient.advisory.query=0&xmqclient.advisory.status=0"

but the network connection is there

[root@fst01 fst]# nc -vz mgm.spacescience.ro 1094
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 2001:b30:4210:1::36:1094.
Ncat: 0 bytes sent, 0 bytes received in 0.29 seconds.

the relevant file can be view in FST directory here:
https://cernbox.cern.ch/index.php/s/q0JYLqs7Z4T2HOS

Thank you!!
Adrian

Hi Adrian,

Your FST is trying to connect to the MGM rather than the MQ daemon which runs on port 1097 and not on 1094. I think this is quite slow and painful to go through all the small steps that you need for configuring an instance. I suggest you checkout the following project and use it to spawn a dummy instance based on a Docker image. Inside you can find the scripts that configure everything you need. This is not customized for the Alice use case but I think it is already a better start than what you have now. Just follow these steps and you will have a fully configured EOS instance in docker containers on your machine:

git clone https://gitlab.cern.ch/eos/eos-docker.git
cd eos-docker
sudo ./scripts/start_services.sh -q -i gitlab-registry.cern.ch/dss/eos:4.7.8

To destroy the setup just use the shutdown_services.sh script.

Cheers,
Elvin

Thank you, it is a good start point as a documentation reference.
It would seem that the mq service was not started… i will see if i can make the mgm depend on mq…
the systemd service files need some refactoring :slight_smile:

The systemd files aren’t perfect, but “systemctl start eos” starts the MQ service. At least in my setup. Is it possible you don’t have the mq service in XRD_ROLES in your eos_env?

e.g.

XRD_ROLES="mq sync mgm "

I did change the systemd files for the fuse mount slightly (for clients only) to have it restart if something causes it to go away (crash, network issues, whatever) and handle multiple mounts better.

well, i just use it as recommended meaning start eos@mgm (with my simple minded thinking that maybe the mq if found in roles will be started automatically) … of course i was wrong :slight_smile: