CERN Accelerating science

Migration from eosd to eosxd : some strange crash blocks us

Hello,

In the process of trying to get rid of eosd (since it has been announced to be soon deprecated), we would like to report the different issues that remain on our side. Since it was advised to report such issues, here I am with them.

In addition to some random crashes that we have on the client currently in production (very few, one per month in total out of dozens of client), we could find another situation that reproduces systematically. The context is complex, but maybe the backtrace can help to find what is probably a corner-case.

The context is quite complex and specific, but as a beginning, the backtrace provided is this one :

Stack trace (most recent call last) in thread 53015:
#7    Object ", at 0xffffffffffffffff, in 
#6    Object "/lib64/libc.so.6, at 0x7fa8dc0d39fc, in clone
#5    Source "pthread_create.c", line 0, in start_thread [0x7fa8dc3aaea4]
#4    Object "/lib64/libfuse.so.2, at 0x7fa8ded17400, in fuse_session_loop
#3    Object "/lib64/libfuse.so.2, at 0x7fa8ded1ab6a, in fuse_reply_iov
#2    Object "/lib64/libfuse.so.2, at 0x7fa8ded1b49b, in fuse_reply_open
#1    Source "/usr/src/debug/eos-4.8.10-1/fusex/eosfuse.cc", line 2751, in  [0x4fad0d]
       2748:         }
       2749: 
       2750:    if (Instance().Config().options.rm_rf_protect_levels &&
      >2751:        isRecursiveRm(req) &&
       2752:             Instance().mds.calculateDepth(md) <=
       2753:             Instance().Config().options.rm_rf_protect_levels) {
       2754:           eos_static_warning("Blocking recursive rm (pid = %d)", fuse_req_ctx(req)->pid);
#0    Source "/usr/src/debug/eos-4.8.10-1/fusex/eosfuse.cc", line 6113, in  [0x4dc621]
       6110:   ProcessSnapshot snapshot = fusexrdlogin::processCache->retrieve(ctx->pid,
       6111:                              ctx->uid, ctx->gid, false);
       6112: 
      >6113:   if (snapshot->getProcessInfo().getRmInfo().isRm() &&
       6114:       snapshot->getProcessInfo().getRmInfo().isRecursive()) {
       6115:     bool result = true;
Segmentation fault (Address not mapped to object [0x1])
# umounthandler: executing fusermount -u -z /mnt/eos/jeodpp# umounthandler: sighandler received signal 11 - emitting signal 11 again

This was done with v 4.8.10, but also with latest v4.8.40 (with debuginfo installed) we get the same behaviour :

Stack trace (most recent call last) in thread 39253:
#7    Object ", at 0xffffffffffffffff, in 
#6    Object "/lib64/libc.so.6, at 0x7fb45d2d79fc, in clone
#5    Source "pthread_create.c", line 0, in start_thread [0x7fb45d5aeea4]
#4    Object "/lib64/libfuse.so.2, at 0x7fb45ff1b400, in fuse_session_loop
#3    Object "/lib64/libfuse.so.2, at 0x7fb45ff1eb6a, in fuse_reply_iov
#2    Object "/lib64/libfuse.so.2, at 0x7fb45ff1f49b, in fuse_reply_open
#1    Source "/root/rpmbuild/BUILD/eos-4.8.40-1/fusex/eosfuse.cc", line 2812, in EosFuse::opendir(fuse_req*, unsigned long, fuse_file_info*) [0x4fc263]
#0    Source "/root/rpmbuild/BUILD/eos-4.8.40-1/fusex/eosfuse.cc", line 6237, in EosFuse::isRecursiveRm(fuse_req*, bool, bool) [0x4dca81]
Segmentation fault (Address not mapped to object [0x1])
# umounthandler: executing fusermount -u -z /mnt/eos/jeodpp
# umounthandler: sighandler received signal 11 - emitting signal 11 again

This backtrace helped us figure out that when running without option rm-rf-protect-levels (or with value 0) the crash doesn’t occur, maybe it can help you find out why, and if this can be fixed to work with this option (we like the idea of having it enabled)

Now if important, the context : the program QGis (graphical interface) runs inside a container in which different subfolders of the same eosxd client are binded (like /eos/jeodpp/data, /eos/jeodpp/home, etc). Opening (in readmode) some raster data (Gdal VRT file) with QGis (this triggers the opening of several GeoTIFF files referenced by the mainfile), this will automatically crash it.

Interessingly, when we bind only once the full eos mount /eos/jeodpp inside the container, this doesn’t occur. This is a specific setup which is probably difficult to reproduce on your side, but if we can provide some extra info, let us know. Up to now, we didn’t observe this behavior with any other software than qgis, and we have many other cases that uses gdal libraries to open such files and never observed that. Probably gdql on qgis has an issue, but it would be nice if access to such file could fail without crashing.

Hi Franck,
thanks for the report. That is easy to fix, there is just a null pointer check missing.

Will push a fix for that.

Cheers Andreas.

I made a ticket for that:

EOS-4676

Thank you Andreas, let us know as soon as some tag contains the fix, so that we can test it, it will also be easy.

FYI, as external we seem to not have access to the JIRA tickets, but we do not really need that, I guess.

Hello,

I could test the newer version with the fix of the above issue, and starting from v 4.8.43 this crash doesn’t happen any more. Thank you !

We have discovered a change in eosxd daemon for use with Kerberos : the default value for looking the credential cache is not /tmp/krb5cc_uid any more, so we have to enforce it at user environment level. We were relying on it for some use case, which is handy. Is there a way to parametrize the default location of the kerberos ticket cache at daemon level with some parameter of the client ?

Hi Franck,

We just released v 4.8.46, that should fetch the default krb5 credential cache value.
Would you give it a try?

Cheers,
Fabio

Hi Fabio,

Thank you for your answer, I tried that version, but it doesn’t work for us.

If I set default_ccache_name = /tmp/krb5cc_%{uid}, I can see these messages in eosxd logs :

f=lookupLocalJail  l=ALERT tid=00007fd8353ff700 s=SecurityChecker:131      Uid 61928 is asking to use credentials '/tmp/krb5cc_0', but file belongs to uid 0! Refusing.

And a similar one when leaving the default value to KEYRING:persistent :

210510 11:00:27 t=1620644427.450414 f=validate         l=ALERT tid=00007f5486bff700 s=CredentialValidator:199  Refusing to use keyring KEYRING:persistent:0 by uid 61928

It seems that the client tries to use a value with uid 0 (maybe because this is the user which runs the eosxd daemon ? I’m testing with eosxd running inside a container)

We have found another case where the eosxd client behaves significantly differently from the eosd client, and has a much poorer performance. It mightbe linked to the read-ahead mechanism, but maybe you can give some more information about that.

The use case is the use of NetCDF files that are read with the ncks command. I’m not the expert of how this tools works internally, but I can say that the command extracts very few selected information from very large files (100GB), probably by doing random access in the file.

The same command runs for about 2-3 minutes using the eosd client, and up 30 minutes with the eosxd one. By analyzing the eos report on the MGM, it appears that the first case reads only about 8GB (rb= field) out of the full 100GB file, where the eosxd client reads the full file, and sometimes even more. Interesting fact, while the process runs on a file, reading this very same file sequentially (like for a copy) on the same eosxd mount occurs at a very slow speed (<1MB/s).

I tried to drastically decrease the read-ahead parameters (namely read-ahead-nominal-bytes or read-ahead-strategy (static/dynamic)). These seem to give some improvement in the processing time (10 minutes instead of 30) but it doesn’t come close to what we get with the eosd client. And I’m afraid that reducing too much (I went down to 32K in static strategy), it might impact overall performance of the eosxd client for other usages.

What suggestion could you give to test/tune this use case ? Is there other parameters that might impact this ? In general, is there documentation on how the read-ahead mechanism of the eosxd client works (if this issue is really linked to that) ?

Thank you in advance,

Franck

Hi Franck,
yes, that is probably the read-ahead mechanism. The defaults are good for streaming and not for very sparse operation.
The defaults are:

"read-ahead-strategy" : "dynamic",
"read-ahead-bytes-nominal" : 262144,
"read-ahead-bytes-max" : 1048576,
"read-ahead-blocks-max" : 16

So if your application reads only 10% by skipping over small areas, this won’t work, because the read-ahead with this policy does not get disabled when you do sparse reading.

You can strace your application to see the read offsets:
strace -e trace=file,read,write,pread64,pwrite64

to understand how big the gaps between the areas are.

You probably want to use the static reahead and put read-ahead-bytes-nominal smaller than the gaps, which your application leaves. This is somehow not optimal.

I can add another parameter, which disables read-ahead if you don’t read atleast percent of the read-ahead data … so you could set this for example to 50%.

In eosd the read-ahead disabling was more trivial. If we saw, you don’t stream, we disabled it.