Hello,
In the process of trying to get rid of eosd (since it has been announced to be soon deprecated), we would like to report the different issues that remain on our side. Since it was advised to report such issues, here I am with them.
In addition to some random crashes that we have on the client currently in production (very few, one per month in total out of dozens of client), we could find another situation that reproduces systematically. The context is complex, but maybe the backtrace can help to find what is probably a corner-case.
The context is quite complex and specific, but as a beginning, the backtrace provided is this one :
Stack trace (most recent call last) in thread 53015:
#7 Object ", at 0xffffffffffffffff, in
#6 Object "/lib64/libc.so.6, at 0x7fa8dc0d39fc, in clone
#5 Source "pthread_create.c", line 0, in start_thread [0x7fa8dc3aaea4]
#4 Object "/lib64/libfuse.so.2, at 0x7fa8ded17400, in fuse_session_loop
#3 Object "/lib64/libfuse.so.2, at 0x7fa8ded1ab6a, in fuse_reply_iov
#2 Object "/lib64/libfuse.so.2, at 0x7fa8ded1b49b, in fuse_reply_open
#1 Source "/usr/src/debug/eos-4.8.10-1/fusex/eosfuse.cc", line 2751, in [0x4fad0d]
2748: }
2749:
2750: if (Instance().Config().options.rm_rf_protect_levels &&
>2751: isRecursiveRm(req) &&
2752: Instance().mds.calculateDepth(md) <=
2753: Instance().Config().options.rm_rf_protect_levels) {
2754: eos_static_warning("Blocking recursive rm (pid = %d)", fuse_req_ctx(req)->pid);
#0 Source "/usr/src/debug/eos-4.8.10-1/fusex/eosfuse.cc", line 6113, in [0x4dc621]
6110: ProcessSnapshot snapshot = fusexrdlogin::processCache->retrieve(ctx->pid,
6111: ctx->uid, ctx->gid, false);
6112:
>6113: if (snapshot->getProcessInfo().getRmInfo().isRm() &&
6114: snapshot->getProcessInfo().getRmInfo().isRecursive()) {
6115: bool result = true;
Segmentation fault (Address not mapped to object [0x1])
# umounthandler: executing fusermount -u -z /mnt/eos/jeodpp# umounthandler: sighandler received signal 11 - emitting signal 11 again
This was done with v 4.8.10, but also with latest v4.8.40 (with debuginfo installed) we get the same behaviour :
Stack trace (most recent call last) in thread 39253:
#7 Object ", at 0xffffffffffffffff, in
#6 Object "/lib64/libc.so.6, at 0x7fb45d2d79fc, in clone
#5 Source "pthread_create.c", line 0, in start_thread [0x7fb45d5aeea4]
#4 Object "/lib64/libfuse.so.2, at 0x7fb45ff1b400, in fuse_session_loop
#3 Object "/lib64/libfuse.so.2, at 0x7fb45ff1eb6a, in fuse_reply_iov
#2 Object "/lib64/libfuse.so.2, at 0x7fb45ff1f49b, in fuse_reply_open
#1 Source "/root/rpmbuild/BUILD/eos-4.8.40-1/fusex/eosfuse.cc", line 2812, in EosFuse::opendir(fuse_req*, unsigned long, fuse_file_info*) [0x4fc263]
#0 Source "/root/rpmbuild/BUILD/eos-4.8.40-1/fusex/eosfuse.cc", line 6237, in EosFuse::isRecursiveRm(fuse_req*, bool, bool) [0x4dca81]
Segmentation fault (Address not mapped to object [0x1])
# umounthandler: executing fusermount -u -z /mnt/eos/jeodpp
# umounthandler: sighandler received signal 11 - emitting signal 11 again
This backtrace helped us figure out that when running without option rm-rf-protect-levels
(or with value 0) the crash doesn’t occur, maybe it can help you find out why, and if this can be fixed to work with this option (we like the idea of having it enabled)
Now if important, the context : the program QGis (graphical interface) runs inside a container in which different subfolders of the same eosxd client are binded (like /eos/jeodpp/data, /eos/jeodpp/home, etc). Opening (in readmode) some raster data (Gdal VRT file) with QGis (this triggers the opening of several GeoTIFF files referenced by the mainfile), this will automatically crash it.
Interessingly, when we bind only once the full eos mount /eos/jeodpp inside the container, this doesn’t occur. This is a specific setup which is probably difficult to reproduce on your side, but if we can provide some extra info, let us know. Up to now, we didn’t observe this behavior with any other software than qgis, and we have many other cases that uses gdal libraries to open such files and never observed that. Probably gdql on qgis has an issue, but it would be nice if access to such file could fail without crashing.