MGM+QDB crash, unable to reserve inodes

crystal · March 28, 2018, 3:52am

hello!

while running eos touch in a loop today i managed to fill up the quarkdb ns directory, resulting in 2 out of 3 quarkdb nodes crashing due to io error. this also caused the mgm to crash because it couldn’t reserve inode(s), apparently.

not entirely sure what a good solution to this would be - maybe have the mgm return an error message instead of crashing, or die neatly?

i’ve also kept the stack trace if it’s of any interest.

gbitzes · March 28, 2018, 7:52am

Hi Crystal,

Indeed, a full disk will result in QDB crashing for now, it’s not able to handle such an error. I plan on having it switch into read-only mode in such case, while printing lots of scary warnings in the log.

The MGM crash is probably a bug, in theory it shouldn’t crash if QDB is down, stacktrace would be appreciated.

Cheers,
Georgios

crystal · March 28, 2018, 8:27am

switching to read-only sounds good! @davidjericho also posted something similar (but about the in-memory namespace) mentioning that the mgm tries to handle writes to an almost full disk - could something similar be done for quarkdb?

that is, maybe start yelling/warning as the disk is starting to get full, then switch to read only at some higher threshold?

gbitzes · March 28, 2018, 8:31am

Yes indeed, we should start yelling before the read-only threshold is reached. I’ll just have a background thread poll every few seconds how full the partition QDB resides in is, and start warning / trigger read-only when needed.

CERN Accelerating science

MGM+QDB crash, unable to reserve inodes