I am currently running a production environment on CentOS 7.9, and given the EOS status, I need to transition to AlmaLinux 9. I am debating between two approaches and would value your expert opinion on efficiency and stability.
My Current Stack:
OS: CentOS 7
The Dilemma:
In-Place Upgrade (elevate): Performing a staged upgrade (7 → 8 → 9) using the elevate framework.
Clean Reinstall & Data Migration: Provisioning a new AlmaLinux 9 instance and migrating data/configs.
Specific Questions:
How stable is the elevate path when jumping two major versions (7 to 9)? Does it often result in “dependency hell” for non-standard packages?
Since AlmaLinux 9 completely removes network-scripts, how does the in-place upgrade handle legacy network configurations without losing remote access?
For a production environment, which method typically results in less actual downtime when factoring in troubleshooting?
I am looking for a balance between speed and long-term system integrity. Any “lessons learned” from your own migrations would be greatly appreciated!
What we did for the CentOS7 to Alma9 upgrade was a clean reinstall of the OS. Make sure that the disks holding the data are not mounted, to avoid any surprises and make a clean OS install.
Afterwards, you can mount the data disks as nothing changes in the format as far as EOS is concerned, and with the help of you configuration management system you can install the necessary packages for EOS.
As far as downtime is concerned, you can upgrade the FSTs one by one, and if you have multiple machines for MGM/QuarkDB you can upgrade also these one by one, therefore you should be able to avoid any downtime. The lessons learned part would be to make sure not to wipe out any of the data disks that store EOS data.
Hi Elvin,
Thank you for the clear advice! The strategy of performing a clean OS install while keeping data disks unmounted makes perfect sense to avoid any accidental data loss.
I have a follow-up question regarding the MGM and QuarkDB migration. Since I am moving from CentOS 7 to AlmaLinux 9:
QuarkDB Migration: What is the recommended way to migrate the QuarkDB data? Should I perform a logical backup/restore (export/import), or is it safe to simply tar the database directory and move it to the new OS? Alternatively, if I have a HA setup, can I just add the new Alma9 node to the cluster and let it sync automatically from the remaining CentOS 7 nodes?
MGM Configuration: For the MGM nodes, beyond the /etc/eos.keytab and configuration files, are there specific pitfalls I should watch out for when moving from the older EOS version on CentOS 7 to the latest on Alma9?
Step-by-step Workflow: Could you briefly outline the sequence? For example: “Stop service → Unmount → Install OS → Reinstall EOS → Remount → Resync”.
I want to ensure that the metadata integrity is maintained throughout this “one-by-one” upgrade process.
Our site also went through an update from CentOS 7 to AlmaLinux 9. I’d like to share our experience from that process.
QuarkDB Migration
Generally, QuarkDB (QDB) does not require manual data backups. While a single-node setup poses a significant risk, a quorum-based cluster (3 or more nodes) allows for a safe rolling migration.
Procedure: Replace the servers one by one.
Workflow: 1. Perform a clean OS and QDB installation on the new server. 2. Start the QDB service and verify that it joins the quorum. 3. Monitor the raft-info to ensure data synchronization is complete. 4. Once the node is fully synced, proceed to the next server.
2. MGM Configuration
When migrating the MGM, several configuration files must be handled with care. We backed up the following files for our environment, though some may be unnecessary if your site is not an ALICE site. Since configurations vary, please double-check and verify your specific file list multiple times.
[!IMPORTANT] If you have multiple MGMs, you must migrate them one at a time. You should also be prepared to remove a node from the pool immediately if any issues occur.
Backup File Checklist:
/etc/eos.keytab
/etc/xrd.cf.mgm
/etc/sysconfig/eos_env
/etc/grid-security/xrootd/TkAuthz.Authorization
/etc/mlsensor/mlsensor.properties (Maybe alice only?)
/etc/grid-security/hostkey.pem
/etc/grid-security/hostcert.pem
/etc/grid-security/grid-mapfile
/etc/grid-security/xrootd/pubkey.pem (Maybe alice only?)
/etc/grid-security/xrootd/privkey.pem(Maybe alice only?)
/etc/eos.macaroon.secret
/etc/xrootd/scitokens.cfg
/etc/cron.d/edg-mkgridmap
/etc/logrotate.d/edg-mkgridmap
3. Migration Sequence
In our experience, we prioritized the QDB migration and then proceeded with the MGM. However, the specific order is not strictly critical and can be adjusted based on your requirements.
Thank you so much for the detailed checklist and sharing your experience! It is extremely helpful, especially the list of MGM configuration files.
However, my current setup is a single-node configuration (no HA/Quorum for QuarkDB). Since I cannot rely on the “rolling migration” or Raft auto-sync that you described, I would like to clarify the safest way to handle the metadata:
Single-node QuarkDB Migration: In this case, is it sufficient to stop the quarkdb service on CentOS 7, create a tarball of the entire data directory (typically /var/lib/quarkdb/), and then extract it onto the new AlmaLinux 9 installation? Are there any known compatibility issues with the underlying RocksDB format when jumping from EL7 to EL9?
Downtime Expectation: Since I only have one MGM/QuarkDB node, I expect a service outage during the OS reinstall. Is there anything else I should back up to ensure the MGM recognizes the restored QuarkDB immediately after the reinstall?
FSTs: My plan for the storage nodes (FSTs) remains to follow Elvin’s advice: unmount, reinstall OS, and remount.
If you have any specific tips for “single-node” survivors, I would deeply appreciate it!
This is correct for you setup. There were no issues for us with respect to the RocksDB format.
Normally (depending on the EOS version that you are currently running) the MGM is already stateless and all the configuration is stored in QuarkDB. Therefore, there shouldn’t be anything else to save from the MGM machine for the upgrade (besides QuarkDB).
If your EOS configuration is already stored in QuarkDB then your MGM is stateless. You can check this by doing:
redis-cli -p 7777 hlen “eos-config:default”
If this returns a non-zero number then you should be good.
Thank you for the tip! I have just run the redis-cli check, and the result was non-zero. This confirms that my MGM is indeed stateless and the configuration is fully stored in QuarkDB, which gives me much more confidence for the migration.
Since I am on a single-node setup, I’ve outlined my final manual migration plan below. Could you please double-check if this sequence is correct?
Backup: Stop eos@mgm and quarkdb services. Use tar -cvpzf to back up the entire /var/lib/quarkdb/ directory and /etc/eos.keytab to external storage.
Reinstall: Perform a clean install of AlmaLinux 9 on the system drive while keeping the FST data disks unmounted/protected.
Restore: Install the EOS/QuarkDB packages on Alma9. Restore the /var/lib/quarkdb/ directory (ensuring proper daemon ownership) and replace the eos.keytab.
Launch: Start QuarkDB first, verify its status, then start the MGM service.
One final technical detail: Can I directly install the latest stable EOS/QuarkDB release for EL9 and point it to the restored data directory? Or would you recommend matching the version used in CentOS 7 first before upgrading the software itself?
Yes, the steps look fine. When it comes to the version you want to install, it depends on the version you are currently using. Updating from 5.1.\* to 5.2.\* you need to perform the conversion in the FSTs from LevelDB (which is dropped in 5.2) to extended attributes. If you are already running 5.2 then upgrading to 5.3 you can simplify your deployment by dropping the MQ daemon. Its functionality is taken over by QuarkDB that is responsible for relaying messages internally in the cluster.
For this last change you just need the following env variable in the /etc/sysconfig/eos_env file on both the MGM and the FSTs.
Having said that it’s probably simpler to just upgrade the OS to the current version that you have and then perform an EOS upgrade later on - otherwise you might have to many moving parts.
Thank you for the tip! I have just run the redis-cli check, and the result was non-zero. This confirms that my MGM is indeed stateless and the configuration is fully stored in QuarkDB, which gives me much more confidence for the migration.
I’ve also confirmed my versions: I am currently on 5.1.19 (CentOS 7) and planning to move to 5.3.27 (Alma9). Regarding your point about LevelDB, since 5.1.19 is quite old, is the conversion to extended attributes mandatory before I move the data to Alma9? If I perform a clean install of 5.3.27 on Alma9 and remount the FST disks, will they be unreadable without that conversion?
Also, for the MQ change, I will add EOS_USE_MQ_ON_QDB=1 to both MGM and FSTs as you suggested. Should I expect any issues with the QuarkDB schema being upgraded directly from 5.1 to 5.3?
Lastly, please note that our team will be celebrating the Lunar New Year holiday soon, so I might not be able to reply to your follow-up until after February 23rd.
You must do the conversion before moving above version 5.2.0, since later on this not possible. You can read more about the changes from 5.1. to 5.2 in this post:
Therefore, probably a good upgrade path is to first do the conversion, then upgrade to 5.2.32. At this point you can do the OS upgrade everywhere keeping the EOS version to 5.2.32. Once this is done, you can do another round of upgrades enabling EOS_USER_MQ_ON_QDB=1 once you install the latest 5.3.29.
There should be no issues with QuarkDB as you upgrade, as long as you move in steps 5.1 → 5.2 → 5.3 and the upgrade should be done QuarkDB → MGM → FSTs.
Enjoy the holidays and let me know if you have any issues with the upgrade.
Thanks for the crucial tip about the 5.1 → 5.2 → 5.3 upgrade path. I was about to attempt a direct jump to 5.3, so your email came at the perfect time.
I will follow the stepped upgrade approach as you suggested and ensure the namespace conversion is completed before leaving version 5.2.