georgep
(George Patargias)
September 5, 2024, 1:43pm
1
Hello,
We have set up a client machine to EOS (eos-client-5.2.23-1.el9.x86_64) and although all the following vars have been exported
export EOS_MGM_URL=root://antares-eos01.scd.rl.ac.uk
export XrdSecPROTOCOL=sss
export XrdSecSSSKT=/etc/eos.keytab
We get the error
[root@cms-rucio-services1 ~]# eos ns
error: MGM root://antares-eos01.scd.rl.ac.uk not online/reachable
[root@cms-rucio-services1 ~]#
ping/traceroute/tracepath/ssh work both ways
I dont think we have missed anything but can you please confirm?
Thanks,
George
georgep
(George Patargias)
September 5, 2024, 2:04pm
2
Strangely, pointing to the MGM of another EOS cluster works…
[root@cms-rucio-services1 ~]# EOS_MGM_URL=root://antares-eos15.scd.rl.ac.uk eos node ls
┌──────────┬────────────────────────────────┬────────────────┬──────────┬────────────┬────────────────┬─────┐
│type │ hostport│ geotag│ status│ activated│ heartbeatdelta│ nofs│
└──────────┴────────────────────────────────┴────────────────┴──────────┴────────────┴────────────────┴─────┘
nodesview antares-eos15.scd.rl.ac.uk:1095 undef online on 2 15
nodesview antares-eos16.scd.rl.ac.uk:1095 undef online on 1 15
nodesview antares-eos17.scd.rl.ac.uk:1095 undef online on 2 23
nodesview antares-eos18.scd.rl.ac.uk:1095 undef online on 2 23
nodesview antares-eos19.scd.rl.ac.uk:1095 undef online on 3 23
nodesview antares-eos20.scd.rl.ac.uk:1095 undef online on 2 23
nodesview antares-eos21.scd.rl.ac.uk:1095 undef online on 3 23
nodesview antares-eos22.scd.rl.ac.uk:1095 undef online on 1 23
nodesview antares-eos23.scd.rl.ac.uk:1095 undef online on 2 23
[root@cms-rucio-services1 ~]#
rptaylor
(Ryan Taylor)
September 5, 2024, 10:06pm
3
Running a client command with the XRD_LOGLEVEL=Dump
env var can be useful.
You could also try e.g. nc -vz antares-eos01.scd.rl.ac.uk 1094
to check the network connection.
georgep
(George Patargias)
September 6, 2024, 10:35am
4
Thanks for these suggestions.
nc shows successfull connection to the MGM.
Running an eos client command having set XRD_LOGLEVEL=Dump
does show some fatal auth errors which I really cant understand as the /etc/eos.keytab is in place.
[2024-09-06 11:30:33.005701 +0100][Error ][AsyncSock ] [antares-eos01.scd.rl.ac.uk:1094.0] Socket error while handshaking: [FATAL] Auth failed
[2024-09-06 11:30:33.005801 +0100][Error ][PostMaster ] [antares-eos01.scd.rl.ac.uk:1094 ] Unable to recover: [FATAL] Auth failed.
[2024-09-06 11:30:33.005824 +0100][Debug ][XRootD ] [antares-eos01.scd.rl.ac.uk:1094 ] Handling error while processing kXR_ping (): [FATAL] Auth failed.
[2024-09-06 11:30:33.005898 +0100][Debug ][ExDbgMsg ] [antares-eos01.scd.rl.ac.uk:1094 ] Calling MsgHandler: 0xc778e0 (message: kXR_ping () ) with status: [FATAL] Auth failed.
George
georgep
(George Patargias)
September 9, 2024, 3:17pm
5
Sorry for the hassle.
Any thoughts on this issue?
esindril
(Elvin Alin Sindrilaru)
September 10, 2024, 6:37am
6
Hi @georgep ,
As Ryan already pointed out, the full output of the eos whoami
command with XRD_LOGLEVEL=Dump
is very useful in this situation. By the looks of it (though not all relevant info is present in your snippet) the sss
keytab that is on the client does not match the server or client sss key is not in the list of sss keys accepted by the server. A quick checksum of the concerned sss keytab files should clear the mystery.
Cheers,
Elvin
georgep
(George Patargias)
September 10, 2024, 8:14am
7
Hi Elvin,
I had forgotten to update the “sec.protbind” line in the /etc/xrd.cf.mgm with the client’s new hostname. Because our first (and overriding) binding is “sec.protbind * only gsi unix” the auth failed.
Apologies for this!.
George