Hello,
Our EOS production instance for Alice is still running EOS version 4.7.7. I would like to update the entire cluster to the latest stable version which is 4.8.31 if I am not wrong.
I had a quick look at the release notes but I prefer to ask if it is OK to jump from 4.7.7 directly to 4.8.31 and if there are specific precautions to take.
Also, my method is to update successively;
There were quite some important changes in between those releases and it’s quite hard to say if there is any unwanted side effect while running in mixed mode.
One important step is that you need to disable the converter until all the nodes are updated to the same version.
I would suggest to update to the latest 4.8.40 which is running stable in production on our setup so that you don’t need to do another update later on. There were quite a number of issues fixed between .31 and .40 release.
I will promote the .40 release to the stable repo today.
Hi Elvin,
Thank you very much, Could you remind me of the commands to disable/reenable the converter ? Thank you. Il will start to look at the update shortly, although all servers are currently busy with rebalancing the cluster following the addition of new servers. Unless you advise me that updating while there is this rebalancing is a bad idea…
JM
That is not new, we had the same dependency also before, unless you system updated to the 2020 version. Then you need to downgrade eos-folly, we had a plan to move to a new folly version and built the packages, but we didn’t yet do the move. Newer eos packages have a strict dependency on eos-folly-2019 while old one did not have so that is why probably you have the 2020 version.
Elvin,
I updated to eos-server-4.8.40-1. incidentally I also reinstalled the VM on which EOS manager runs but the eos services cannot start because of:
[QCLIENT - ERROR - getNextEndpoint:112] Unable to resolve any endpoints, possible trouble with DNS
The QDB cluster looks OK although the server mentioned in mgmofs.cfgredishost is not the leader… Can it be the reason ? Can this variable take a list of hosts ?
Yes, it can take also a list of host but it also works with an alias if it point to the correct thing.
I guess you mean mgmofs.qdbcluster there is no such config mgmofs.cfgredishost in the /etc/xrd.cf.mgm config file.
From the first batch only mgmofs.cfgtype is used, the other I don’t think were ever used. The mgmofs.qdbcluster is used for all the communication with QDB.
Then restarted EOS services and it works (without any other modification on the network, firewalls, configuration files)… So I think there may be sth wrong with version 4.8.40-1…
How can I help ?
Thank you
JM
So what is exactly the problem. Can you paste some logs?
I don’t understand this statement: eos manager runs but the eos services cannot start
To which eos services are you referring? The FSTs? Can you also paste your /etc/xrd.cf.fst config file?
Hi Elvin,
The full statement was : I also reinstalled the VM on which EOS manager runs but the eos services cannot start
In short : * the eos mgm cannot start*
I am going to update again to 4.8.40-1 and send you a more complete log of the mgm trying to start.
JM
Elvin,
This is exactly this ! replacing commas by spaces in both config files /etc/xrd.cf.mgm and /etc/xrd.cf.mq makes it work.
Is this sth new and different between eos vervsions 4.8.31 and 4.8.40 ?
Do you still want to see /etc/xrd.cf.mgm ?
Thank you very much.
JM
Yes, the parsing changed to make it consistent with how it works in QuarkDB, sorry, I forgot about this. I don’t need to see the config file. Glad things work now!
Hi Elvin,
I have updated on a test machine that will be rebooted tomorrow (system updates including kernel) (I will be on site) and I will perform the updates on the managers in prod (round-robin update) next week.
About stopping the converter, the current situation is converter off but balancer on:
[root@naneosmgr01(EOSMASTER) ~]#eos space status default