Upgrade procedure from Aquamarine to Citrine

franck-jrc · March 5, 2018, 3:03pm

Dear fellows,

Trying to start discussion on this nice new website.

At JRC, we are preparing the Aquamarine to Citrine upgrade for our main eos instance next week.

From the tests we did, we extracted a short page of steps and configuration changes for the process that we want to share with you in case you are planning the upgrade (below the post).

In addition, you might have some comments or experience to share if you already did these step, it might help us.

We tested a full offline upgrade procedure on a separate instance : stop everything, then upgrade and start MGM, then FSTs one by one, and it really seems straightforward.

But since our main instance is quite large with 2PB of data, 17 FSTs and 130M files, this might take a while, and we were thinking of tempting an online upgrade, by switching the instance in read only in the process so that users can still access the data, and start the FSTs after, having at least 3 FSTs booting at a time. How does this sound as a solution ?

We tried some scenario, and it seems that MGM and FSTs can correctly collaborate, at least for some time, with mismatched versions. We had the impression that it is best the first upgrade the MGM, then the FSTs one by one, but does someone have some arguments against this
Plus, what would be the best way to turn the instance read-only ? Turn all FSTs read-only, or add some access rule ?

The JRC team.

Upgrade from Aquamarine to Citrine

The procedure is quite as easy as minor version upgrade, but all components should be done in one shot. CERN advises to upgrade all servers at the same time (because of some changes in communication protocol) but partial upgrade works in tests (safeguard by setting the instance read-only ?)

Procedure

Run compaction for faster boot.

Set global stall and stop all servers (FST + MGM), or decide to change servers one by one (by setting read-only on the whole instance to avoid errors, if possible)

Current advised version for eos server is 4.2.20 (after first citrine upgrade to 4.2.12, this 4.2.20 version solved various bugs).

For safety, backup all /var/eos folders !!

Change configuration (see Configuration changes)

Change eos repos (see yum.repo file)

Run MGM upgrade (see Upgrade MGM)

Run FST upgrade (see Upgrade FSTs)

Some changes in using the instance (see Changes in new version)

Configuration changes

All servers

The eos configuration is moved from /etc/sysconfig/eos to /etc/sysconfig/eos_env. The export directive needs to be removed, as well as the service alias definition. Conversion to remove export directives seems to work this way :

cat /etc/sysconfig/eos | sed -e 's/export //' > /etc/sysconfig/eos_env

If present, these lines need to be removed (they cause a warning in eos log at when starting) :

# ------------------------------------------------------------------
# Service Script aliasing for EL7 machines
# ------------------------------------------------------------------

which systemctl >& /dev/null
if [ $? -eq 0 ]; then
   alias service="service --skip-redirect"
fi

MGM

Necessary lines to be added in MGM’s /etc/xrd.cf.mgm

#-------------------------------------------------------------------------------
# Set the namespace plugin implementation
#-------------------------------------------------------------------------------

mgmofs.nslib /usr/lib64/libEosNsInMemory.so

Now also in the configuration, synchronization service needs to explicitly define local and remote MGM host (! configuration is different on both MGMs) as EOS_MGM_MASTER1/2 values are not sufficient any more. In /etc/sysconfig/eos_env, add :

EOS_MGM_HOST=fqdn.of.local.host.domain
EOS_MGM_HOST_TARGET=fqdn.of.remote.host.domain

FST

The FSTs need to be geotagged, otherwise no write can be scheduled there. So on all FSTs in /etc/sysconfig/eos add the value (can be anything to work, just not empty; can be set to rack or switch to foresee future expansions)

export EOS_GEOTAG='JRC_DC'

The default xrd.cf.fst file changed a bit (automatically done by upgrade if file is not modified) :

xrootd.fslib libXrdEosFst.so needs to become xrootd.fslib -2 libXrdEosFst.so
comment or remove these lines :

#ofs.authlib libXrdEosAuth.so
#ofs.authorize

yum.repo file

https://eos-docs.web.cern.ch/eos-docs/quickstart/setup_repo.html#eos-citrine

/etc/yum.repos.d/eos.repo

[eos-citrine]
name=EOS 4.0 Version
baseurl=https://storage-ci.web.cern.ch/storage-ci/eos/citrine/tag/el-7/x86_64/
gpgcheck=0
enabled=1

[eos-dep]
name=EOS 4.0 Dependencies
baseurl=https://storage-ci.web.cern.ch/storage-ci/eos/citrine-depend/el-7/x86_64/
gpgcheck=0
enabled=1

(remove any obsolete /etc/yum.repos.d/eos-dep.repo if any)

Use also the xrootd-stable repository :

[xrootd-stable]
name=XRootD Stable repository
baseurl=http://xrootd.org/binaries/stable/slc/7/$basearch
gpgcheck=1
enabled=1
protect=0
gpgkey=http://xrootd.cern.ch/sw/releases/RPM-GPG-KEY.txt

Upgrade MGM

Set global stall
eos access set stall 1000 w

OR

Switch namespace in read only :
eos access set stall 1000 w <== this is currently not supported so not an option

Set all nodes readonly. For each node:
eos node config node.fqdn configstatus=ro

Stop the old service service --skip-redirect eos stop

To latest version :

yum upgrade "eos-*"

To specific version (e.g 4.2.12) :

 yum upgrade eos-server-4.2.12-1.el7.cern eos-client-4.2.12-1.el7.cern eos-debuginfo-4.2.12-1.el7.cern

Start eos systemctl start eos

server should boot normally

check synchronisation : systemctl status eossync@*

Update also slave

Check synchronisation, etc…

Before upgrading the FSTs, switch off the fsck system, which will cause the MGM to uselessly count all missing files. Since all FST are currently down, this represent all the files in the namespace.

eos fsck disable

Upgrade FSTs

Stop the old service service --skip-redirect eos stop

To latest version :

yum upgrade "eos-*"

To specific version (e.g 4.2.12) :

 yum upgrade eos-server-4.2.12-1.el7.cern eos-client-4.2.12-1.el7.cern eos-debuginfo-4.2.12-1.el7.cern

Start service : systemctl start eos

Expect a full boot because they need to fully resynchronize all FS with MGM (probably 30 min to 1 hour, depends on the number of files)

Finalize

When all FSTs have booted, and all works, you can switch back to read/write :

eos access rm stall w

Changes in new version

Service management systemd

The eos server now fully uses systemd, so systemctl command to handle services. Commands are :

systemctl start eos to start all eos services (including eossync)
systemctl restart eos@* to restart all eos services
systemctl start eos@mgm to start just mgm
systemctl restart eossync@* to restart eossync services
systemctl status eossync@* to get synchronization status (not as easily readable as old version)

Details are here http://eos-docs.web.cern.ch/eos-docs/eos_services.html

Master/slave switch

the previous command service --skip-redirect eos master mq doesn’t work any more. The replacements are :

systemctl start eos@master

systemctl start eos@slave

hroussea · March 5, 2018, 5:26pm

Hi there !

This sounds OK, we know there used to be a problem if a write fails and the FST is using XRootD 4 while the MGM is still using the old codebase: it might fill the disk for some reason (but the developers can maybe comment on that). Note that of course if you set the instance to read-only (and wait for all the writes to finish on the FST node) then this cannot happen

The best way to turn the instance read-only in the general case is to use
eos access set stall XXX w

But this only prevents user IO, so it could be that FSTs are replicating data between them, it would probably be better to turn all the FSTs readonly to be on the safe side.

Hope it helps

franck-jrc · March 6, 2018, 10:15am

Hi Hervé,

Thank you for your answer!

OK, so indeed by upgrading first the MGM, we should avoid this situation.

OK, thanks for the tip. We were more thinking of a way to return an error to the users when trying to make modifications, but at least this solution protects the namespace.

Indeed, we were of course thinking of disabling any balancing or other background services, but yes, we will surely apply both protections.

Yes, it surely does, thanks again !

About the eos version to use, it seems that currently 4.2.12 is a good candidate as it runs on many CERN instances.

franck-jrc · March 6, 2018, 11:23am

Is this working ? On the instance were we tested upgrade to citrine, it silent doesn’t stall for write, but does it for read :

# eos access ls
# eos access set stall 60 w
success: setting global stall to 60 seconds for <w>
# eos access ls
# eos access rm stall w
error: redirect or stall has to be defined (errc=22) (Invalid argument)
# eos config dump | grep -i stall
global:/config/contingency/mgm/#Stall => w:*~60~,           //w:* rule is in configuration, but not loaded
# eos access set stall 60 r
success: setting global stall to 60 seconds for <r>
# eos access ls
# ....................................................................................
# Stall Rules ...
# ....................................................................................
[ 01 ]                              r:* => 60
# eos config dump | grep -i stall
global:/config/contingency/mgm/#Stall => r:*~60~,    //w:* disappeared

It does this for any version 4.2.4, 4.2.12, 4.2.16…

apeters · March 6, 2018, 12:20pm

You cannot use the stall ‘w’ rule because this is used by the master/slave mechanism to bounce writing clients from read-only states. You can use a global stall, but this stalls read and writes. Maybe you can open a ticket and we add this functionality.

franck-jrc · March 6, 2018, 2:45pm

OK, thank you for the answer.

Yes, I suppose such a feature can be good, to avoid any write during a maintenance for instance, or in this case to allow a smooth upgrade to newer version. We will use some alternative in our case, but I’ll open a ticket as you suggest for future use.

dszkola · March 6, 2018, 8:31pm

Are you guys sure about this part:

add these lines :

ofs.authlib libXrdEosAuth.so
ofs.authorize

That library is in aquamarine but not in citrine. And I can’t get the fst to come up in citrine without commenting those two lines out.

To that same point, can someone explain in more detail than the doc what needs to be done (and whether it needs to be done at all) to get the auth plugin setup?

esindril · March 7, 2018, 8:10am

Yes, indeed these two lines need to be commented out in the xrd.cf.fst file since they are not needed any more. The corresponding functionality is now bundled in the libXrdEosFst.so.

franck-jrc · March 7, 2018, 8:14am

Hi,

Yes sorry, the diff between citrine and aquamarine xrd.cf.fst files has been read in wrong direction . I’ll modify the text.

Thanks for noticing!

dszkola · March 8, 2018, 10:14pm

Noticed something else today setting up a 4.2.15 instance. The fuse client will not look for /etc/sysconfig/eos_env. You have to have a /etc/sysconfig/eos. Sort of frustrating.

hroussea · March 9, 2018, 6:46am

I think that /etc/sysconfig/eos_env is for the server-side of things, while Fuse(x) being client-side it makes sense to use another file.

Also note that /etc/sysconfig/eos_env is named like this just to avoid a conflict with non-systemd nodes (which systemd will happily ignore if not in the right format)

franck-jrc · March 14, 2018, 5:24pm

Good evening,

Some report about our citrine upgrade that took place today at JRC. It went generally well. Some comments/observations while still warm :

the FST boots were not so hard for our MGM, we could launch up to 4-5 of them at the same, with a 5 minutes delay minimum. Our MGM is quite a large hardware, 40 CPUs and 1TB+ RAM
RAIN files seems to have experienced some changes : they all are reported as d_mem_sz_diff (we have about 500K of them, with 12 stripes, mostly small files, so not adapted to RAIN, we reckon) and the eos file verify commands don’t do anything, but the files are correctly readable
Some deadlock might have happened, when running eos fsck repair --resync to try to fix the above d_mem_sz_diff, we had to stop the MGM and restart it (or the process was just taking way too long ?). We took some stacktrace that we didn’t have time to analyze
a new report category has appeared (briefly), and was quite populated for us : rep_missing_n, reporting many missing replicas, mostly on 0 size files
there might be a signed integer problem on the file size in eos file check output : when replica is missing, statsize is reported as 18446744073709551615 instead of -1 in aquamarine
the systemctl services are not always giving expected results. Command systemctl start eos on the FST does not start the daemon, we needed to use systemctl start eos@fst. It is not also very clear which one needs to be enabled so that FST start at boot, eos, eos@fst, or both. And it happened to us that when it starts, the network is not completely ready, so the FST shuts down after 2 minute, after reporting URL is not valid: root:////dummy
Also the service maybe doesn’t stop when rebooting the system : we have the problem on one FST, it needed to resyng all the LevelDB content on more than half its fs, although they were in read-only mode

We might extend some of these points in details in dedicated posts later, and we will try to update the above procedure.

Thank you all for your help.

franck-jrc · March 15, 2018, 8:43am

Another thing that we did while upgrading, was to disable the fsck system on the MGM until all FST were back, because it was causing high activity on the MGM.

We observed that the LevelDB files are less stable than was sqlite : when the FST is stopped brutally (we had this night an unexpected reboot on one of our server), almost all FS start a fresh reboot; although absolutely no write activity was occurring at that time.

CERN Accelerating science