CERN Accelerating science

Unable to send the message kXR_stat


(Yaodong Cheng) #1

Hi all,

Recently we always met the following errors after we updated EOS fuse from 4.2.7 to 4.2.18 at IHEP site. These errors may also lead to eosd segment fault sometimes. The version of EOS server is also 4.2.18. Is there anybody can help us? Thanks a lot.

[2018-04-28 15:50:51.458295 +0800][Error ][XRootD ][ 3789] [*CzeAIwA@hxmteos01.ihep.ac.cn:1095] Unable to send the message kXR_stat (handle: 0x00
000000, flags: none): [ERROR] Invalid session
[2018-04-28 15:50:51.458457 +0800][Error ][XRootD ][ 3789] [*CzeAIwA@hxmteos01.ihep.ac.cn:1095] Unable to send the message kXR_read (handle: 0x00
000000, offset: 348160, size: 4096): [ERROR] Invalid session
180428 15:50:51 t=1524901851.458492 f=pread l=ERROR tid=00007f42e57ff700 s=filesystem:3856 failed read off=348160, len=4096
180428 15:50:51 t=1524901851.585873 f=fileWaitAsyncIO l=ERROR tid=00007f42e07fd700 s=XrdIo:736 error=async requests failed for file path=root
://*CzeAIwA@hxmteos01.ihep.ac.cn///#curl#/eos/hxmt/work/Devel/netbase/libnetbase.so


(Veselin Vasilev) #2

Hi all,

This is an important issue, we at the JRC have observed similar errors ourselves, but we have observed them with all client and server version combinations of EOS.

We never managed to get the real issue here, and therefore its very difficult to generate a meaningful bug report on the problem.

It would be great if we could find more about that problem.

Greetings


(Yaodong Cheng) #3

Actually, coredump of eosd has been generated by abrtd, but it was deleted immediatly.
May 4 13:30:11 hlogin07 kernel: eosd[16975]: segfault at 7f3decc00000 ip 0000003ae608ab10 sp 00007f3deebf2108 error 4 in libc-2.12.so[3ae6000000+18a000]
May 4 13:30:41 hlogin07 abrt[17292]: Saved core dump of pid 77322 (/usr/bin/eosd) to /var/spool/abrt/ccpp-2018-05-04-13:30:11-77322 (2406801408 bytes)
May 4 13:30:42 hlogin07 abrtd: Package ‘eos-fuse-core’ isn’t signed with proper key
May 4 13:30:42 hlogin07 abrtd: ‘post-create’ on ‘/var/spool/abrt/ccpp-2018-05-04-13:30:11-77322’ exited with 1
May 4 13:30:42 hlogin07 abrtd: Deleting problem directory ‘/var/spool/abrt/ccpp-2018-05-04-13:30:11-77322’

I have set ‘OpenGPGCheck = no’ in the file abrt-action-save-package-data.conf,but it still doesn’t work.
Who knows how to make abrtd keep the coredump file? And it would be helpful for developers to find the bugs.

Thanks.