CERN Accelerating science

MGM Fail on Kolkata EOS QuarkDB

Hi EOS Experts,

We’ve set up 3-node EOS quarkDB/MGM/MQ cluster. We have facing error on starting eos@mgm i.e. "Found QDB cluster members, but no password. EOS will only connect to password-protected QDB instances. (mgmofs.qdbpassword / mgmofs.qdbpassword_file missing) and Unable to create file system object via libXrdEosMgm.so.

We had configured quarkdb in 3 nodes with “redis.password” in /etc/xrootd/xrootd-quarkdb.cfg, quarkdb cluster are run good.

[root@eos-mgm ~]# ls -ltr /etc/xrootd/xrootd-quarkdb.cfg
-rw-r–r-- 1 root root 244 Mar 22 15:29 /etc/xrootd/xrootd-quarkdb.cfg
[root@eos-mgm ~]#
[root@eos-mgm ~]# grep redis.password /etc/xrootd/xrootd-quarkdb.cfg
redis.password_file /etc/xrootd/qdb.password
[root@eos-mgm ~]# ls -ltr /etc/xrootd/qdb.password
-r-------- 1 xrootd xrootd 67 Mar 22 15:24 /etc/xrootd/qdb.password
[root@eos-mgm ~]#

  1. NODES eos-mgm.tier2-kol.res.in:7777,eos-slave.tier2-kol.res.in:7777,eos-qdb.tier2-kol.res.in:7777
  2. OBSERVERS
  3. QUORUM-SIZE 2

  4. REPLICA eos-qdb.tier2-kol.res.in:7777 | ONLINE | UP-TO-DATE | NEXT-INDEX 21 | VERSION 0.4.2
  5. REPLICA eos-slave.tier2-kol.res.in:7777 | ONLINE | UP-TO-DATE | NEXT-INDEX 21 | VERSION 0.4.2

Then, we have install EOS packages in 2 nodes out of 3 nodes for EOS Master/Slave setup.
We set following variable in /etc/xrd.cf.mgm, eos and eos_env are:-
+++++
[root@eos-mgm ~]# grep -i quark /etc/xrd.cf.mgm
mgmofs.cfgtype quarkdb
mgmofs.nslib /usr/lib64/libEosNsQuarkdb.so

[root@eos-mgm ~]# grep qdb /etc/xrd.cf.mgm
mgmofs.qdbcluster eos-mgm.tier2-kol.res.in:7777 eos-slave.tier2-kol.res.in:7777 eos-qdb.tier2-kol.res.in:7777
mgm.qdbpassword_file /etc/qdb.passwd.eos
[root@eos-mgm ~]#
++++++
The permission of mgm.qdbpassword_file /etc/qdb.passwd.eos is xrootd:xrootd.
[root@eos-mgm ~]# cat /etc/sysconfig/eos
test -e /usr/lib64/libjemalloc.so.1 && export LD_PRELOAD=/usr/lib64/libjemalloc.so.1
XRD_ROLES=“mq mgm”
export EOS_INSTANCE_NAME=eosalicekolkata

#export EOS_BROKER_URL=root://eoskolkata.tier2-kol.res.in:1097//eos/
export EOS_MGM_ALIAS=eoskolkata.tier2-kol.res.in
export EOS_FUSE_MGM_ALIAS=eoskolkata.tier2-kol.res.in

#Master-Slave Configuration
export EOS_MGM_MASTER1=eos-mgm.tier2-kol.res.in
export EOS_MGM_MASTER2=eos-slave.tier2-kol.res.in

export EOS_MAIL_CC="vikasssinghal@gmail.com"
export EOS_NOTIFY=“mail -s date +%s-hostname-eos-notify $EOS_MAIL_CC”
export EOS_HTTP_THREADPOOL=“epoll”
export EOS_HTTP_THREADPOOL_SIZE=32
export EOS_HTTP_CONNECTION_MEMORY_LIMIT=65536
#####export EOS_BROKER_URL=root://localhost:1097//eos/

[root@eos-mgm ~]#
[root@eos-mgm ~]# cat /etc/sysconfig/eos_env
DAEMON_COREFILE_LIMIT=unlimited
LD_PRELOAD=/usr/lib64/libjemalloc.so.1
KRB5RCACHETYPE=none
XRD_ROLES=“mq mgm”
EOS_MGM_ALIAS=eoskolkata.tier2-kol.res.in
EOS_FUSE_MGM_ALIAS=eoskolkata.tier2-kol.res.in

###Master-Slave Configuration
EOS_MGM_MASTER1=eos-mgm.tier2-kol.res.in
EOS_MGM_MASTER2=eos-slave.tier2-kol.res.in

EOS_MGM_HOST=eos-mgm.tier2-kol.res.in
####EOS_MGM_HOST_TARGET=eos-slave.tier2-kol.res.in
EOS_INSTANCE_NAME=eosalicekolkata

#The mail notification in case of fail-over
EOS_MAIL_CC="vikasssinghal@gmail.com"
EOS_NOTIFY=“mail -s date +%s-hostname-eos-notify $EOS_MAIL_CC”
EOS_HTTP_THREADPOOL=“epoll”
EOS_HTTP_THREADPOOL_SIZE=32
EOS_HTTP_CONNECTION_MEMORY_LIMIT=65536
EOS_GEOTAG=“Kolkata::EOS2”

EOS_USE_QDB_MASTER=1

EOS_NS_ACCOUNTING=1
EOS_SYNCTIME_ACCOUNTING=1
EOS_USE_SHARED_MUTEX=1
#-------------------------------------------------------------------------------
#QuarkDB Configuration
#-------------------------------------------------------------------------------
#QuarkDB Hostport
EOS_QUARKDB_HOSTPORT=eos-mgm.tier2-kol.res.in:7777 eos-slave.tier2-kol.res.in:7777 eos-qdb.tier2-kol.res.in:7777
#QuarkDB Password
EOS_QUARKDB_PASSWD=/etc/qdb.passwd.eos
[root@eos-mgm ~]#
++++++++

After that, when we start eos@mgm, it’s status is shown “Activating”
[root@eos-mgm ~]# systemctl status eos@mgm
● eos@mgm.service - EOS mgm
Loaded: loaded (/usr/lib/systemd/system/eos@.service; disabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2021-03-23 18:46:37 IST; 4s ago
Process: 89967 ExecStart=/usr/sbin/eos_start.sh -n %i -c /etc/xrd.cf.%i -l /var/log/eos/xrdlog.%i -Rdaemon (code=exited, status=1/FAILURE)
Process: 89925 ExecStartPre=/bin/sh -c /usr/sbin/eos_start_pre.sh eos-start-pre %i (code=exited, status=0/SUCCESS)
Main PID: 89967 (code=exited, status=1/FAILURE)

Mar 23 18:46:37 eos-mgm.tier2-kol.res.in systemd[1]: Unit eos@mgm.service entered failed state.
Mar 23 18:46:37 eos-mgm.tier2-kol.res.in systemd[1]: eos@mgm.service failed.
[root@eos-mgm ~]#

Log of xrdlog.mgm are below:-

==============

[root@eos-mgm ~]# tail -40 /var/log/eos/mgm/xrdlog.mgm
=====> sec.protbind localhost.localdomain sss unix
=====> sec.protbind localhost sss unix
=====> sec.protbind * only sss unix
Config 5 authentication directives processed in /etc/xrd.cf.mgm
------ Authentication system initialization completed.
++++++ Protection system initialization started.
Config warning: Security level is set to none; request protection disabled!
Config Local protection level: none
Config Remote protection level: none
------ Protection system initialization completed.
Config Routing for eos-mgm.tier2-kol.res.in: local pub4 prv4
Config Route all4: eos-mgm.tier2-kol.res.in Dest=[::144.16.112.14]:1094
Plugin loaded
++++++ © 2015 CERN/IT-DSS MgmOfs (meta data redirector) 4.8.35
=====> mgmofs enforces SSS authentication for XROOT clients
jemalloc is loaded!
jemalloc heap profiling is disabled
=====> mgmofs.hostname: eos-mgm.tier2-kol.res.in
=====> mgmofs.hostpref: eos-mgm
=====> mgmofs.managerid: eos-mgm.tier2-kol.res.in:1094
=====> mgmofs.fs: /
=====> mgmofs.targetport: 1095
=====> mgmofs.authlib : /usr/lib64/libXrdAliceTokenAcc.so
=====> mgmofs.authorize : true
=====> mgmofs.instance : eosalicekolkata
=====> mgmofs.metalog: /var/eos/md
=====> mgmofs.txdir: /var/eos/tx
=====> mgmofs.authdir: /var/eos/auth
=====> mgmofs.qosdir: /var/eos/qos/
=====> mgmofs.reportstorepath: /var/eos/report
=====> mgmofs.cfgtype: quarkdb
=====> mgmofs.fstgw: someproxy.cern.ch:3001
=====> mgmofs.nslib : /usr/lib64/libEosNsQuarkdb.so
=====> mgmofs.qdbcluster : eos-mgm.tier2-kol.res.in:7777 eos-slave.tier2-kol.res.in:7777 eos-qdb.tier2-kol.res.in:7777
=====> Configuration error: Found QDB cluster members, but no password. EOS will only connect to password-protected QDB instances. (mgmofs.qdbpassword / mgmofs.qdbpassword_file missing)
210323 18:47:48 92004 XrootdConfig: Unable to create file system object via libXrdEosMgm.so
210323 18:47:48 92004 XrootdConfig: Unable to load file system.
------ xrootd protocol initialization failed.
210323 18:47:48 92004 XrdProtocol: Protocol xrootd could not be loaded
------ xrootd mgm@eos-mgm.tier2-kol.res.in:-1 initialization failed.
[root@eos-mgm ~]#

==================

Are above configuration ok or something missing? The Roles, which are define on eos and eos_env , are correctly defined? What are the permission of EOS_QUARKDB_PASSWD and mgm.qdbpassword_file and quarkdb?

I had try to run eos@mgm with changing the permission of mgm.qdbpassword_file i.e. daemon:daemon and xrootd:xrootd, but it’s still fail. Evenly, we use mgmofs.qdbpassword_file instead of mgm.qdbpassword_file with same permission. But no successes.

Suggest us accordingly.

Regards
Prasun

Hi Prasun,

You have a typo in you configuration. Namely
mgm.qdbpassword_file /etc/qdb.passwd.eos
shoud actually be
mgmofs.qdbpassword_file /etc/qdb.passwd.eos

This needs to be owned by user daemon.

Cheers,
Elvin