Couldn't make FST node online

Hi,
I have set up a FST node but couldn’t make it online. Check xrdlog.fst and it shows the message as below. Any idea how to fix this issue? Thanks! Jingya

231122 01:22:41 time=1700616161.547397 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@hpstor15.grid.sinica.edu.tw:1095 tid=00007f4644dfd700 source=Config:78 tident= sec=(null) uid=99 gid=99 name=- geo=“” msg=“waiting for config queue in Publish …”
231122 01:22:43 time=1700616163.547611 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@hpstor15.grid.sinica.edu.tw:1095 tid=00007f4644dfd700 source=Config:78 tident= sec=(null) uid=99 gid=99 name=- geo=“” msg=“waiting for config queue in Publish …”
231122 01:22:45 time=1700616165.547830 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@hpstor15.grid.sinica.edu.tw:1095 tid=00007f4644dfd700 source=Config:78 tident= sec=(null) uid=99 gid=99 name=- geo=“” msg=“waiting for config queue in Publish …”
231122 01:22:47 time=1700616167.548102 func=getFstNodeConfigQueue level=INFO logid=static… unit=fst@hpstor15.grid.sinica.edu.tw:1095 tid=00007f4644dfd700 source=Config:78 tident= sec=(null) uid=99 gid=99 name=- geo=“” msg="waiting for config queue in Publish .

Hi Jing-Ya,

There are different reason why this could happen, but basically the FST could not connect to the MQ daemon of you instance to get updates and messages from the MGM. I would double check the ports of the different daemon to see if they are open and also check the MGM to see if it receives heartbeats from this FST by doing eos node ls.

Then, I would check the MQ logs to see if the FST is connected to it. Look through the FST logs and check that it’s trying to connect to the correct MQ broker. In general, this is a communication issue between this FST and the rest of the cluster.

Hope it helps,
Elvin

Hi Elvin,
Thanks for your reply!
It seems FST connecting to MQ failed. Other FST nodes have the same configuration, but don’t have this problem. Could that be authentication issue or other settings?

231127 03:07:56 time=1701054476.548172 func=RefreshBrokersEndpoints level=ERROR logid=static… unit=fst@hpstor15.grid.sinica.edu.tw:1095 tid=00007f17a7bff700 source=XrdMqClient:508 tident= sec=(null) uid=99 gid=99 name=- geo=“” msg=“failed to contact broker” url=“root://eos01.grid.sinica.edu.tw:1097//eos/hpstor15.grid.sinica.edu.tw:1095/fst_mq_test?xmqclient.advisory.flushbacklog=0&xmqclient.advisory.query=0&xmqclient.advisory.status=0”

Thanks
Jingya

Hi Jing-Ya,

The FSTs and MQ use sss authentication to talk to each other, therefore as long as the checksum of the sss keytab which normally is /etc/eos.keytab matches between the FSTs and the MQ machine then this should not be an issue. Sometimes, the XRootD client which is used for the communication between the FSTs and the MQ might block due to network issues in your infrastructure, but normally this would be fixed by a simple restart of you FST.

Cheers,
Elvin