MGM crash when find or fileinfo produces listings over 2 GiB

davidjericho · July 26, 2019, 12:08pm

Hello,

It appears when a find or fileinfo command, requested either via HTTP or Xrootd, produces a result file that’s over 2GiB, the non-QDB MGM thread crashes. I have not managed to capture a core dump to be used, as it wipes itself for some reason, and the core dump is approximately 200GB as well.

What I have captured, is the log entries from when this occurs. I don’t understand how the sending of results from /tmp/eos.mgm/ works, so I haven’t been able to write a patch.

In this case, we’ve had a user successfully write in 6.8 million files into a single directory, and they can readily fetch them singly or do a simple ls over the contents. If I request metadata in the form of a find -I, or a fileinfo in json format, the result of the query is successfully staged through /tmp/eos.mgm/ as a multi-gigabyte file, but the MGM thread associated crashes and the client never sees any reply to the query.

The log entry when doing this via HTTP reads (paths somewhat sanitised):

190725 09:50:28 time=1564048228.435535 func=Emsg                     level=ERROR logid=57393b86-aebe-11e9-b6f8-506b4b385e7c unit=mgm@gdpt-a1.cdn.aarnet.edu.au:1094 tid=00007f404c7df700 source=XrdMgmOfsFile:2813             tident=minio.1:1@crlt-a2.cdn.aarnet.edu.au sec=http  uid=48 gid=48 name=48 geo="crlt" Unable to read ; File too large
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::append
error: received signal 6:
/lib64/libXrdEosMgm.so(_Z20xrdmgmofs_stacktracei+0x44)[0x7f41928f8c74]
/lib64/libc.so.6(+0x36340)[0x7f419886c340]
/lib64/libc.so.6(gsignal+0x37)[0x7f419886c2c7]
/lib64/libc.so.6(abort+0x148)[0x7f419886d9b8]
/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165)[0x7f419917b7d5]
/lib64/libstdc++.so.6(+0x5e746)[0x7f4199179746]
/lib64/libstdc++.so.6(+0x5e773)[0x7f4199179773]
/lib64/libstdc++.so.6(+0x5e993)[0x7f4199179993]
/lib64/libstdc++.so.6(_ZSt20__throw_length_errorPKc+0x77)[0x7f41991ce7a7]
/lib64/libstdc++.so.6(+0xbe9ed)[0x7f41991d99ed]
#########################################################################
# stack trace exec=/usr/bin/xrootd pid=1 what='thread apply all bt'
#########################################################################

.....

#########################################################################
# -----------------------------------------------------------------------
# Responsible thread =>
# -----------------------------------------------------------------------
# Thread 5 (Thread 0x7f404c7df700 (LWP 491)):
#########################################################################
#4  <signal handler called>
#5  0x00007f419886c2c7 in raise () from /lib64/libc.so.6
#6  0x00007f419886d9b8 in abort () from /lib64/libc.so.6
#7  0x00007f419917b7d5 in __gnu_cxx::__verbose_terminate_handler() ()
   from /lib64/libstdc++.so.6
#8  0x00007f4199179746 in ?? () from /lib64/libstdc++.so.6
#9  0x00007f4199179773 in std::terminate() () from /lib64/libstdc++.so.6
#10 0x00007f4199179993 in __cxa_throw () from /lib64/libstdc++.so.6
#11 0x00007f41991ce7a7 in std::__throw_length_error(char const*) ()
   from /lib64/libstdc++.so.6
#12 0x00007f41991d99ed in std::string::append(char const*, unsigned long) ()
   from /lib64/libstdc++.so.6
#13 0x00007f4192ae4072 in eos::mgm::HttpHandler::Get(eos::common::HttpRequest*, bool) () from /lib64/libXrdEosMgm.so
#14 0x00007f4192ae8aec in eos::mgm::HttpHandler::HandleRequest(eos::common::HttpRequest*) () from /lib64/libXrdEosMgm.so
#15 0x00007f4192adc5da in eos::mgm::HttpServer::Handler(void*, MHD_Connection*, char const*, char const*, char const*, char const*, unsigned long*, void**) ()
   from /lib64/libXrdEosMgm.so
#16 0x00007f4192269858 in eos::common::HttpServer::StaticHandler(void*, MHD_Connection*, char const*, char const*, char const*, char const*, unsigned long*, void**) () from /usr/lib64/libeosCommonServer.so.4
#17 0x00007f418ea91df0 in call_connection_handler ()
   from /usr/lib64/libmicrohttpd.so.10
#18 0x00007f418ea93a67 in MHD_connection_handle_idle ()
   from /usr/lib64/libmicrohttpd.so.10
#19 0x00007f418ea99730 in MHD_epoll () from /usr/lib64/libmicrohttpd.so.10
#20 0x00007f418ea998fe in MHD_select_thread ()
   from /usr/lib64/libmicrohttpd.so.10
#21 0x00007f4199631dd5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f419893402d in clone () from /lib64/libc.so.6
#########################################################################
# stack trace exec=/usr/bin/xrootd pid=1 what='generate-core-file'
#########################################################################

If I search for matching logid lines, I only see:

xrdlog.mgm:190725 09:26:54 time=1564046814.190081 func=open                     level=INFO  logid=57393b86-aebe-11e9-b6f8-506b4b385e7c unit=mgm@gdpt-a1.cdn.aarnet.edu.au:1094 tid=00007f404c7df700 source=XrdMgmOfsFile:193              tident=minio.1:1@crlt-a2.cdn.aarnet.edu.au sec=http  uid=48 gid=48 name=48 geo="crlt" op=read path=/proc/user/ info=mgm.cmd=find&mgm.option=fI&mgm.path=/eos/..../commvault-library/files/&eos.app=http
xrdlog.mgm:190725 09:50:28 time=1564048228.435535 func=Emsg                     level=ERROR logid=57393b86-aebe-11e9-b6f8-506b4b385e7c unit=mgm@gdpt-a1.cdn.aarnet.edu.au:1094 tid=00007f404c7df700 source=XrdMgmOfsFile:2813             tident=minio.1:1@crlt-a2.cdn.aarnet.edu.au sec=http  uid=48 gid=48 name=48 geo="crlt" Unable to read ; File too large

On the MGM after the crash, I have:

[root@gdpt-a1 eos.mgm]# pwd
/tmp/eos.mgm
[root@gdpt-a1 eos.mgm]# ls -al
total 8768708
drwx------ 2 daemon daemon         43 Jul 25 09:50 .
drwxrwxrwt 1 root   root          233 Jul 25 09:53 ..
-rw-r--r-- 1 daemon daemon 8979155718 Jul 25 09:50 139914137958144.mResultstream

The contents of 139914137958144.mResultstream is valid and complete, and has the correct starting and terminating fields the MGM writes into the stream.

Repeating the same test using xrdcp, adjusting the syntax for source appropriately, I get:

190726 01:03:03 055 XrdLink: Unable to send to root.242004:172@crlt-a2; broken pipe
190726 01:03:03 055 XrootdXeq: root.242004:172@crlt-a2 disc 0:14:44 (send failure)
190726 01:05:18 043 XrdLink: Unable to send to root.242004:175@crlt-a2; broken pipe
190726 01:05:18 043 XrootdXeq: root.242004:175@crlt-a2 disc 0:13:59 (send failure)

Nothing in the error.log when doing this with xrdcp.

The impact of this is that there’s no fast way for me to request a metadata listing across this directory, and the only alternative is to literally throw millions of stat calls against the contents.

esindril · July 30, 2019, 7:47am

Hi David,

What happens here is that the HTTP interface expects the result in one go so everything is appened to a string. There is a limitation in c++ for the maximum size of a string that usually is 4GB but this depends on the platform. And this is exactly what happens for the HTTP case, the code throws an std::length_error exception.

Now, I would have expected the xrdcp corresponding command to work fine since xrdcp will actually stream the result like a normal read of a long file. Could you retry the operation with xrdcp and post the results? It’s not clear to me exactly what fails in this case or if there is any crash.

Thanks,
Elvin

davidjericho · July 30, 2019, 10:08am

Thank you @esindril. That explains the HTTP issue, and that’s what I suspected.

Is there any reason this can’t be streamed instead of sent from a string? As I said, I couldn’t find exactly how this worked, so I couldn’t look at a fix myself.

Regarding xrdcp, it too failed, although without the core dump and stacktrace. It simply logged:

190726 01:03:03 055 XrdLink: Unable to send to root.242004:172@crlt-a2; broken pipe
190726 01:03:03 055 XrootdXeq: root.242004:172@crlt-a2 disc 0:14:44 (send failure)
190726 01:05:18 043 XrdLink: Unable to send to root.242004:175@crlt-a2; broken pipe
190726 01:05:18 043 XrootdXeq: root.242004:175@crlt-a2 disc 0:13:59 (send failure)

In this case, xrdcp was pushing to stdout, and then I was redirecting to a file via the shell, so I shouldn’t hit size issues either.

esindril · July 30, 2019, 11:58am

Hi David,

Let’s try a few things. First, I suspect the open operation times out since producing the output probably takes more than 60 seconds. Can you run again the xrdcp command with the following env variables set?
XRD_STREAMTIMEOUT=600 XRD_REQUESTTIMEOUT=1200

Also it would be helpful to post logs that you get when running the xrdcp command. Either by setting XRD_LOGLEVEL=Dump or using the -d 3 flag of xrdcp.

Another option is to use the newfind command from the eos console which has builtin protection for this kind of errors. So just run using “eos newfind” and the options that you need.

Cheers,
Elvin

davidjericho · August 2, 2019, 4:25am

Thank you @esindril, I can confirm using xrdfs and increasing the stream and request timeouts, I can now query a QDB MGM using ls, against a directory containing containing 7 million objects.

Identical directories on the old Citrine MGM, and the new QDB MGM take quite different times. QDB takes 2 minutes 34 to list 7 million objects in a straight filename listing format. The old Citrine takes 58 seconds. I can deal with this. The changed timeouts do make sense here.

I retested the find command using xrdcp find with the-I output switch, and it still does fail. The error returned is “Run: [ERROR] Server responded with an error: [3005] Unable to read ; File too large”, with the result only returning just over 4 million of the 6.7 million objects present. EOS did not crash though, with no core dump or other error, as noted in the original post. EOS did clean up /tmp/eos.mgm after the failure using xrdcp. While monitoring the /tmp/eos.mgm directory, I saw the mResultstream file reach 3.4 GiB.

The exact command was:
xrdcp -s 'root://mgm//proc/user/?mgm.cmd=find&mgm.option=I&mgm.find.maxdepth=1&mgm.path=/eos/bigdirectory/'

esindril · August 2, 2019, 9:32am

Hi David,

I had a look at the MGM code and we have a historical test for FILE_OFFSET_BITS which is never set and therefore one can not return more than 2GB of data from the result of proc commands. So this fully explains the behaviour you see.

I will push a fix for this now.

Cheers,
Elvin

davidjericho · August 4, 2019, 10:19pm

Thank you Elvin, we appreciate the fix very much.

CERN Accelerating science

MGM crash when find or fileinfo produces listings over 2 GiB