Support for stable version

Dear maintainer,

We are looking for deploying a small production environment, so we hope to get the stable version suggestion from your professional view.

Like:
Alma 8. ? + Citrine 4.8. ?
Or
Alma 9.3 + Diopside 5. ?

This below is what I found in repo:
Diopside testing version 5.2.4(el-7/el-8/el-9) 5.1.28(el-8s/el-9s CentOS Stream)
Diopside release version 5.1.13(el-7/el-8 8.5)

Citrine testing version 2023-09-12 4.8.105(el-7/el-8s) / 2021-10-22 4.8.65(el-8 8.5)
Citrine release version 2023-03-28 4.8.98(el-7) / 2021-09-07 4.8.62(el-8 8.5)

Currently we have tasted several 5.x versions but 5.x/etc/eos/config/* configuration file system failed to make 5.x work well.

Thanks in advance for support!

We recommend to use Alma 9.3 and 5.2.4 from the testing repository.

If you have a node with IPV4 address, you just do:

yum-config-manager --add-repo "https://storage-ci.web.cern.ch/storage-ci/eos/diopside/tag/testing/el-9/x86_64/"
yum-config-manager --add-repo "https://storage-ci.web.cern.ch/storage-ci/eos/diopside-depend/el-9/x86_64/"
yum install -y eos-server eos-quarkdb eos-fusex jemalloc-devel --nogpgcheck

eos daemon sss recreate
# type in the number
eos daemon run qdb
#.... (stays in foreground)
(control-Z bg)
eos daemon run mgm
(control-Z bg)
#.... (stays in foreground)
eos whoami
Virtual Identity: uid=0 (0,3,99) gid=0 (0,4,99) [authz:sss] sudo* host=localhost domain=localdomain

# if you are not uid=0 you might have only IPV6 and a different reverse translation of localhost
# in this case you might try running all eos commands as
# eos -r 0 0 root://localhost 

5.2.4 with /var/eos/conf/ configuration runs by default with one service less (no MQ). To register an FST one has to do first:

eos node set `hostname -f`:1095 on 
eos daemon run fst
#.... (stays in foreground)
# CONTROL-Z bg

After this you can continue with a simple setup like this:

for name in 01 02 03 04 05 06; do
  mkdir -p /data/fst/$name;
  chown daemon:daemon /data/fst/$name
done

eos space define default

eosfstregister -r localhost /data/fst/ default:6

for name in 2 3 4 5 6; do eos fs mv --force $name default.0; done

eos space set default on

eos mkdir /eos/dev/rep-2/
eos mkdir /eos/dev/ec-42/
eos attr set default=replica /eos/dev/rep-2 /
eos attr set default=raid6 /eos/dev/ec-42/
eos chmod 777 /eos/dev/rep-2/
eos chmod 777 /eos/dev/ec-42/

mkdir -p /eos/
eosxd -ofsname=`hostname -f`:/eos/ /eos/

as it is shown here 4.3. Getting Started — EOS DIOPSIDE documentation

But the lines with “mq” have to be skipped, while the FST node has to be enabled before starting the FST daemon.

If this works, you can start the EOS services not anymore manually, but as systemd services as described here:
https://eos-docs.web.cern.ch/diopside/manual/configuration.html#eos5-configuration

All this can also be done also with the ‘classical’ approach, where one writes directly 4 configuration files by hand and uses systemd as described here 4.4. Configuration — EOS DIOPSIDE documentation

We will update the manual to reflect the new defaults, where there is no MQ service anymore to start but FSTs have to be registered.

Cheers Andreas.

Thanks a lot for your information.

These are settings for a single node configuration. Do you have any guidelines how to set up cluster solution (Active/Passive MgM with multiple standalone FTS nodes) with which versions, because when we do the “by the book” installation we fail to set up the cluster solution.

Can you provide info on which versions to use for the cluster and which documents to use?

Thanks in advance!

So the recipe to add an FST to a standalone installation requires 4 steps:

On the MGM node:
1 copy /etc/eos.keytab over a secure connection to each FST node
2 enable the FST node on the MGM:
eos node set fstnode.domain:1095 on
3 on the FST node define the MGM node name in /etc/eos/config/all
SERVER_HOST=mgmnode.domain
4 start/enable the FST service
eos daemon run fst
or
systemctl enable eos5-fst@fst
systemctl start eos5-fst@fst

Now the FST should show up as online when you do on the MGM:
eos node ls

I will add the info how to make the single node MGM a HA cluster in a nother comment a little later and I think it is probably a good idea that I add this directly also to the documention!

Dear Andreas,
Do you have any info regarding HA cluster configuration installation procedure?
Also if you have already updated documentation you can share it too.

Thanks a lot in advance,

I am working on it, but I actually faced a problem with the dynamic expansion of a QDB cluster. Hope to be able to provide all the docu asap!

So here is the method to go from a single node QDB/MGM to an HA configuration assuming that you start after having a single node setup with MGM and QDB running on node1.

  yum-config-manager --add-repo "https://storage-ci.web.cern.ch/storage-ci/eos/diopside/tag/testing/el-9/x86_64/"
  yum-config-manager --add-repo "https://storage-ci.web.cern.ch/storage-ci/eos/diopside-depend/el-9/x86_64/"
  yum install -y eos-server eos-quarkdb eos-fusex jemalloc-devel --nogpgcheck

  systemctl start firewalld
  for port in 1094 1100 7777; do
   firewall-cmd --zone=public --permanent --add-port=$port/tcp
  done

  # Copy /etc/eos.keytab from MGM node to the new MGM nodes to /etc/eos.keytab
  scp root@node1:/etc/eos.keytab /etc/eos.keytab

  # Create observer QDB nodes on node2 and node3
  eos daemon config qdb qdb new observer

  # Start QDB on node2 and node3
  systemctl start eos5-qdb@qdb
  systemctl enable eos5-qdb@qdb

  # Allow node2 & node3 as follower on node 1
  @node1: redis-cli -p 7777
  @node1: 127.0.0.1:7777> raft-add-observer node2.domain:7777
  @node1: 127.0.0.1:7777> raft-add-observer node3.domain:7777

  ( this is equivalent to 'eos daemon config qdb qdb add node2.domain:7777' but broken in the release version )

  # node2 & node3 get contacted by node1 and start syncing the raft log

  # Promote node2 and node3 as full members
  @node1: redis-cli -p 7777
  @node1: 127.0.0.1:7777> raft-promote-observer node2.domain:7777
  @node1: 127.0.0.1:7777> raft-promote-observer node3.domain:7777

  ( this is equivalent to 'eos daemon config qdb qdb promote node2.domain:7777 )

  # Verify RAFT status on any QDB node
  redis-cli -p 777
  127.0.0.1:7777> raft-info

  ( this is equivalent to 'eos daemon config qdb qdb info' )

  # Startup MGM	services
  @node2: systemctl start eos5-mgm@mgm
  @node3: systemctl start eos5-mgm@mgm

  # You	can connect on each node using the eos command to the local (active or passive) MGM
  @node1:  eos ns | grep master
  ALL      Replication                      is_master=true master_id=node1.domain:1094
  @node2:  eos ns | grep master
  ALL      Replication                      is_master=false master_id=node1.domain:1094
  @node3:  eos ns | grep master
  ALL      Replication                      is_master=false master_id=node1.domain:1094

  # You can force the QDB leader to a given node e.g.
  @node2: eos daemon config qdb qdb coup

  # you can force the active MGM to run on a given node by running on the current active MGM:
  @node1: eos ns master node2.domain:1094
  success: current master will step down

You should always do all cluster configuration operations etc. on the active MGM!

I will release all this as part of eos-docs.web.cern.ch and will let you know, when it is updated.

Once you have the HA setup, you need ‘something’ more to let clients connect to the currently active MGM.

The easiest is to use a DNS alias, which you point to the current active MGM, otherwise virtual IPs or an HA proxy.

Cheers Andreas.

The new sections are here:

How to add FST nodes

How to extend a single MGM/QDB node to an HA cluster

I will go through it tomorrow to verify the procedure with a fresh installation.

If you find problems, let me know, so we can correct the instructions.