QuarkDB force leader election

It would be quite a coincidence if it was not. :slight_smile:

Was QuarkDB unavailable for long, or just 1-10 seconds? If only a few seconds, the bug must be that for some reason the MQ does not retry its requests toward QDB, gets an unexpected NULL reply object somewhere, and crashes when trying to access it. I havenā€™t had time today to look into that, will do soon.

We didnā€™t detect that it was unavailable, it probably was unavailable just for the time to switch leader after the one crashed, so less than 10 seconds. The MQ crash was what made eos unavailable. We just understood what happened on QDB sied from the same logs we sent you that there was also a QuarkDB crash, so you probably might understand more than us. In fact, the quorum was always reached, but at some point there was probably one leader node, plus one outdated follower, then it kept up quite quickly, and the former leader came back with auto restart from systemd.

By the way, do you know if it is possible/desirable to also set up the autorestart of the MQ, in case this happens again in the future ?

I donā€™t see why not, itā€™s generally a good idea to have systemd auto-restart crashed servers to minimize disruption in cases like these.

By the way, you can check out the changes to fsync policy here: Fsync policy - QuarkDB Documentation

Cheers,
Georgios

Thank you very much Georgios !

Is this planned to be released in a next version ?

Yep, youā€™re in luck, I just did the 0.4.1 release today as Luca would like to have the fsync improvements in our own instances as well:

Cheers,
Georgios

Thank you, we might also install it soon, as we had planned the upgrade to v0.4.0.