QuarkDB bulkload mode crashes when redis-cli command runs

Dear Experts,

Recently, I have been trying to setup a quarkdb cluster to accommodate the old in-memory namespace.

First of all, I would like to make sure the bulkload is properly working before starting anything critical.

So, I created a node (a VM in fact) to be used as bulkload mode and installed required packages for quarkdb and created a configuration file for bulkload as described here: Bulkload mode - QuarkDB Documentation

Service start using systemd is just fine and the log says the initialization was successful. However, when I try to check quarkdb-info in redis-cli, the instance went crash. The full log is here.

And when it crashes, it keeps restarting the instance but keeps failing because the bulkload can only be initialized if the quarkdb directory is newly created. This leads to amount of core dump files creation that quickly consumes /var partition.

The installed package for quarkdb and xrootd are following:
quarkdb-0.4.2-1.el7.cern.x86_64
quarkdb-debuginfo-0.4.2-1.el7.cern.x86_64
xrootd4-libs-4.12.6-1.el7.x86_64
xrootd-client-libs-5.2.0-1.el7.x86_64
xrootd4-client-libs-4.12.6-1.el7.x86_64
xrootd-server-5.2.0-1.el7.x86_64
xrootd-libs-5.2.0-1.el7.x86_64
xrootd-server-libs-5.2.0-1.el7.x86_64

By the way, I also tried to downgrade xrootd from 5.2.0 (epel) to 5.0.3 (xrootd-stable) but it seems that xrootd 5.0.3 does not provide dependencies for quarkdb 0.4.2.

Do you have any ideas?

Thank you.

Best regards,
Sang-Un

Hi,

I haven’t spent much of time on digging into this issue by myself. I am attaching stack trace log as below:

Stack trace (most recent call last) in thread 30715:
#10   Object ", at 0xffffffffffffffff, in
#9    Object "/usr/lib64/libc-2.17.so, at 0x7f44fef319fc, in __clone
#8    Object "/usr/lib64/libpthread-2.17.so, at 0x7f44ffc30ea4, in start_thread
#7    Object "/usr/lib64/libXrdUtils.so.3.0.0, at 0x7f4500092026, in XrdSysThread_Xeq
#6    Object "/usr/lib64/libXrdUtils.so.3.0.0, at 0x7f45000f0ce8, in XrdStartWorking(void*)
#5    Object "/usr/lib64/libXrdUtils.so.3.0.0, at 0x7f45000f0b9e, in XrdScheduler::Run()
#4    Object "/usr/lib64/libXrdUtils.so.3.0.0, at 0x7f45000e99c8, in XrdLink::setProtocol(XrdProtocol*, bool, bool)
#3    Object "/usr/lib64/libXrdUtils.so.3.0.0, at 0x7f45000ed478, in XrdLinkXeq::DoIt()
#2    Source "/usr/src/debug/quarkdb-0.4.2/src/XrdQuarkDB.cc", line 88, in Process [0x7f44f9e49180]
         86:   // TODO log client DN
         87:   if(!link && tlsconfig.active) qdb_info("handling TLS connection. Security is intensifying");
      >  88:   if(!link) link = new Link(lp, tlsconfig);
         89:
         90:   if(!conn) conn = new Connection(link);
#1  | Source "/usr/src/debug/quarkdb-0.4.2/src/Link.cc", line 80, in operator=
    |    78: : Link(tlsconfig_) {
    |    79:   uuid = generateUuid();
    | >  80:   host = lp->Host();
    |    81:   link = lp;
    | Source "/opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h", line 3656, in assign
    | Source "/opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h", line 4333, in length
      Source "/opt/rh/devtoolset-8/root/usr/include/c++/8/bits/char_traits.h", line 322, in Link [0x7f44f9e7b783]
#0    Object "/usr/lib64/libc-2.17.so, at 0x7f44fefa27b1, in __strlen_sse2_pminub
Segmentation fault (Address not mapped to object [(nil)])

Starting xrootd@quarkdb process with systemd is OK. It only crashes when redis-cli -p 4444 quarkdb-info runs.

Any idea on this?

Thank you.

Best regards,
Sang-Un

Hi Sang-Un,

QuarkDB 0.4.2 was only compiled with support for XRootD4. It looks to me that you are trying to start QuarkDB with XRootD5. This will come packaged with EOS5 and it will be compiled against XRootD5. You can find an initial testing version of the packages here:
https://storage-ci.web.cern.ch/storage-ci/eos/diopside/tag/testing/el-7/x86_64/

Note this includes the eos-quarkdb package which provides the libXrdQuarkDB.so.

Cheers,
Elvin

Hi Elvin,

Thanks a lot for the hint. After downgrade the xrootd to v4, it works like charm. I will try EOS5 for testing later.

Thank you.

Best regards,
Sang-Un