Hello,
I wanted to inform you that the upgrade of the whole instance to the latest 4.8.x version (4.8.91) was performed successfully last week, without any problem on eos side (we had our own hardware issues…). I want to thank all of you for your suggestions and your help.
Running this brand new version after such a huge step didn’t affect in any notable way the functioning of our systems. As expected, it works for the users at least as good as before, maybe better, but to be confirmed in the long term for the stability. For sure, on the management side we have a lot of improvements.
Unfortunately, though, we already have to report the first incident, with the MGM stopping working last night, after 8 days of functioning. The MGM stopped serving the fuse clients at 00:00:00 (the logs decreased immediately after that time) and the FSTs started to log query error
, and finally stopped also answering also to the statistics request 6 hours later. But the daemon wasn’t crashed, it was running but just hanged. A restart restored the service.
Does this fact to get blocked at this specific midnight o’clock ring some bell to you ?
Nothing else very useful was found in the logs, no particular error message, just a sudden drop in the activity. Do you think it could be interesting to provide you with the logs (quite large in our case) ?