We had several case of eos FST being killed by the OS for too much memory use.
This happened after we increase the number of active FS on these nodes from 24 to 48. Could there be some link to this increase activity, and undersized servers ? They currently manage ~a total 30M file each for 150GB memory. Some server report 500GB vsize, and up to 129GB rss in ~eos node ls --sys` output, but the values are very variable among different nodes.
When restarted and after all FS have booted, the memory footprint is much lower (few GB), but it then increases with time, it seems. Could there be some memory leak ? eos version on these FST is 4.2.20, maybe some newer version fixes a known bug ?