CERN Accelerating science

QDB: terminated handshake not received


(Crystal) #1

hello,

I am seeing this in the qdb logs:
180328 02:46:56 134 XrdProtocol: ?:46@mgm-master terminated handshake not received

I think these correspond to when the mgm calls MetadataFlusher?

Aside from this communication with the MGM seems ok though (reads/writes are happening). Any idea what might cause the above issue? I tried to increase the redis.trace level and also the xrd loglevel but I’m not seeing any additional info.

Also - is there any way to easily change the quarkdb log level? (something like the debug command in the eos console)


(Georgios Bitzes) #2

Hi Crystal,

I’ve been postponing on debugging this for some time, since like you notice, is not causing any issues… :stuck_out_tongue: We get the same warnings in the logs.

I think it’s some strange interaction between the QDB xrootd plugin, and the default one… Or it’s the way the TCP connection is closed by QClient, cannot tell for sure, I’ll take a look.

There’s no way to set the log level for now - is there some extra information you’d like to see in the log? There’s also “MONITOR” command you can use from redis-cli, to get a stream of all commands received by a node. (point it towards the leader, in followers you’ll just see binary RPC received from the leader)


(Georgios Bitzes) #3

redis.trace is actually ignored for now - initially I planned on having configurable log level, but in the end I did not find a case where it would be useful to only show a subset of log messages, and things like “MONITOR” seem good enough. Open to suggestions, though :slight_smile:


(Crystal) #4

haha i thought i was being clever looking at the code for all the config options and finding redis.trace but i was not clever enough! :stuck_out_tongue:

generally being able to set the loglevel would be a nice thing, although it doesn’t look like quarkdb logs as much (or as diversely) as the other EOS bits do, so it’s not really an important thing right now! I was just hoping that setting the debug level higher would help me figure out the XrdProtocol thing, and then I thought about being able to easily set it back down again.

in our production environment we set the EOS debug levels to CRIT so that it doesn’t get too overwhelming, while for quarkdb I’m really only seeing those XrdProtocol errors and the read/write report per 10 seconds. I haven’t yet come across extra stuff I’d like to see (or stuff I’d like to not see) in the quarkdb logs, I’ll get back to you on that as I play around with it more :smiley:


(Georgios Bitzes) #5

Ok, now I understand the issue: A QClient object will try to keep the connection alive at all costs, even if there’s no requests for hours on end. XRootD is less than impressed at this, thinks the TCP connection is dead, cleans it up, and prints “terminated handshake not received”.

QClient notices the connection went down, and re-connects. :slight_smile: It only happens for QClient objects staying idle for long periods of time. (we have a few of those on the FSTs)

I’ll just put a configurable timeout, so QClient sleeps if there’s no traffic.


(Crystal) #6

THAT WAS EXTREMELY QUICK

thanks for looking into it! :smiley:


(Georgios Bitzes) #7

Hi, I tested the fix for “terminated handshake not received” in our test instance, and the messages have gone away. Let me know if you still see this issue, even after updating to the latest commits of both EOS and QDB from master branch.

Cheers,
Georgios


(Crystal) #8

hello!

I’m trying to compile quarkdb from source, and running into errors:

cd /quarkdb/build/src && /usr/bin/c++   -DXrdQuarkDB_EXPORTS -I/usr/include/xrootd -I/usr/include/xrootd/private -I/quarkdb/deps/qclient/include -I/quarkdb/deps/qclient/src/fmt  -fPIC   -Wall -Wextra -Werror -Wno-unused-parameter -std=c++11 -g3 -fPIC -std=c++11 -o CMakeFiles/XrdQuarkDB.dir/StateMachine.cc.o -c /quarkdb/src/StateMachine.cc
In file included from /quarkdb/src/StateMachine.cc:29:0:
/quarkdb/src/storage/StagingArea.hh: In member function 'rocksdb::Status quarkdb::StagingArea::exists(const rocksdb::Slice&)':
/quarkdb/src/storage/StagingArea.hh:78:106: error: no matching function for call to 'rocksdb::WriteBatchWithIndex::GetFromBatchAndDB(rocksdb::DB*&, rocksdb::ReadOptions, const rocksdb::Slice&, rocksdb::PinnableSlice*)'
     return writeBatchWithIndex.GetFromBatchAndDB(stateMachine.db, rocksdb::ReadOptions(), slice, &ignored);
                                                                                                          ^
/quarkdb/src/storage/StagingArea.hh:78:106: note: candidates are:
In file included from /quarkdb/src/StateMachine.hh:35:0,
                 from /quarkdb/src/StateMachine.cc:24:
/usr/include/rocksdb/utilities/write_batch_with_index.h:187:10: note: rocksdb::Status rocksdb::WriteBatchWithIndex::GetFromBatchAndDB(rocksdb::DB*, const rocksdb::ReadOptions&, const rocksdb::Slice&, std::string*)
   Status GetFromBatchAndDB(DB* db, const ReadOptions& read_options,
          ^
/usr/include/rocksdb/utilities/write_batch_with_index.h:187:10: note:   no known conversion for argument 4 from 'rocksdb::PinnableSlice*' to 'std::string* {aka std::basic_string<char>*}'
/usr/include/rocksdb/utilities/write_batch_with_index.h:189:10: note: rocksdb::Status rocksdb::WriteBatchWithIndex::GetFromBatchAndDB(rocksdb::DB*, const rocksdb::ReadOptions&, rocksdb::ColumnFamilyHandle*, const rocksdb::Slice&, std::string*)
   Status GetFromBatchAndDB(DB* db, const ReadOptions& read_options,
          ^
/usr/include/rocksdb/utilities/write_batch_with_index.h:189:10: note:   candidate expects 5 arguments, 4 provided
/quarkdb/src/StateMachine.cc: In member function 'rocksdb::Status quarkdb::StateMachine::verifyChecksum()':
/quarkdb/src/StateMachine.cc:1016:32: error: 'class rocksdb::DB' has no member named 'VerifyChecksum'
   rocksdb::Status status = db->VerifyChecksum();
                                ^
In file included from /quarkdb/src/StateMachine.cc:29:0:
/quarkdb/src/storage/StagingArea.hh: In member function 'rocksdb::Status quarkdb::StagingArea::exists(const rocksdb::Slice&)':
/quarkdb/src/storage/StagingArea.hh:79:3: error: control reaches end of non-void function [-Werror=return-type]
   }
   ^
cc1plus: all warnings being treated as errors
make[2]: *** [src/CMakeFiles/XrdQuarkDB.dir/StateMachine.cc.o] Error 1
make[2]: Leaving directory `/quarkdb/build'
make[1]: *** [src/CMakeFiles/XrdQuarkDB.dir/all] Error 2
make[1]: Leaving directory `/quarkdb/build'
make: *** [all] Error 2

Does this look at all familiar? The gitlab build passed, so I am probably doing something wrong somewhere!
I’ve got all the dependencies listed in utils/el7-packages.sh, and I’m just following the instructions listed on the installation page.


(Georgios Bitzes) #9

Hi, looks like QuarkDB is trying to use an older version of rocksdb, pretty strange.

Could you post the following information?

  • The platform you’re trying to compile on, and compiler version.
  • The cmake invocation used to compile QuarkDB, and its output.
  • Is there any system package providing rocksdb?

(Crystal) #10

thanks for the tip! I had eos-rocksdb installed from also compiling eos, and I guess that was causing that problem.

I’ve now gotten a bit further:

[ 41%] Building CXX object src/CMakeFiles/XrdQuarkDB.dir/raft/RaftTalker.cc.o
/quarkdb/src/raft/RaftState.cc: In member function 'quarkdb::RaftStateSnapshotPtr quarkdb::RaftState::getSnapshot()':
/quarkdb/src/raft/RaftState.cc:61:43: error: no matching function for call to 'atomic_load(std::shared_ptr<const quarkdb::RaftStateSnapshot>*)'

(and so on)

building in a docker container, centos:7 image, with cmake3 (3.6.3)

-- The C compiler identification is GNU 4.8.5
-- The CXX compiler identification is GNU 4.8.5
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find rocksdb (missing:  ROCKSDB_LIBRARY ROCKSDB_INCLUDE_DIRS)
-- Found hiredis: /usr/lib64/libhiredis.so
-- Found XRootD: /usr/lib64/libXrdServer.so
-- Found uuid: /usr/lib64/libuuid.so
-- Found libdw: /usr/lib64/libdw.so
-- Could NOT find libbfd (missing:  LIBBFD_LIBRARY LIBBFD_INCLUDE_DIR)
-- BACKWARD_HAS_UNWIND=1
-- BACKWARD_HAS_BACKTRACE=0
-- BACKWARD_HAS_BACKTRACE_SYMBOL=0
-- BACKWARD_HAS_DW=1
-- BACKWARD_HAS_BFD=0
-- Found Backward: /quarkdb/deps/backward-cpp
-- Could NOT find GTest (missing:  GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY)
-- Found OpenSSL: /usr/lib64/libssl.so;/usr/lib64/libcrypto.so (found version "1.0.2k")
-- CMake version: 3.6.3
-- Build type:
-- Performing Test HAVE_STD_CPP11_FLAG
-- Performing Test HAVE_STD_CPP11_FLAG - Success
-- Performing Test FMT_CPP11_CMATH
-- Performing Test FMT_CPP11_CMATH - Success
-- Performing Test FMT_CPP11_UNISTD_H
-- Performing Test FMT_CPP11_UNISTD_H - Success
-- Performing Test SUPPORTS_VARIADIC_TEMPLATES
-- Performing Test SUPPORTS_VARIADIC_TEMPLATES - Success
-- Performing Test SUPPORTS_INITIALIZER_LIST
-- Performing Test SUPPORTS_INITIALIZER_LIST - Success
-- Performing Test SUPPORTS_ENUM_BASE
-- Performing Test SUPPORTS_ENUM_BASE - Success
-- Performing Test SUPPORTS_TYPE_TRAITS
-- Performing Test SUPPORTS_TYPE_TRAITS - Success
-- Performing Test SUPPORTS_USER_DEFINED_LITERALS
-- Performing Test SUPPORTS_USER_DEFINED_LITERALS - Success
-- Looking for open
-- Looking for open - found
-- Found PythonInterp: /usr/bin/python (found version "2.7.5")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /quarkdb/build
RUN cd / && git clone https://gitlab.cern.ch/eos/quarkdb.git && mkdir /quarkdb/build && \
    cd /quarkdb && git submodule update --init --recursive && \
    cd /quarkdb/build && cmake3 .. && \
    make && rm -rf /quarkdb

i’ve also given you access to a repo on github if that helps - just clone & run ./build -c


(Georgios Bitzes) #11

I’m using a newer version of gcc to compile QDB than what is default on CC7. To activate:

sudo yum install centos-release-scl
sudo yum install devtoolset-6 # needs to be in a separate yum transaction than centos-release-scl
source /opt/rh/devtoolset-6/enable

I should add this to the documentation… :upside_down_face:


(Crystal) #12

sounds like a good plan :wink:

happy update: using the newer gcc (and a couple more deps, and a bit of tinkering) did it! i can confirm (finally!!) that I’m no longer seeing the handshake errors.

thanks!! :grinning: