QDB namespace conversion process

crystal · May 28, 2018, 12:31am

hi,

I’m trying to figure out the conversion process to qdb, would appreciate a quick rundown on how it’s supposed to work.

I ran eos_ns_convert on a node in “bulkload” mode using our production metadata, this finished in 340077s (~ 4 days, not sure if this is expected)

That process created the state machine but not the raft-journal, so when I switched it to raft mode and restarted, it kept crashing and complaining about not being able to find the raft journal, to fix this (maybe??) I ran quarkdb-journal --create manually.

It booted fine after that, so I added two other (empty) nodes to the cluster, but then it decided one of the empty nodes should be leader, so I guess it didn’t pick up on the stuff I spent four days loading maybe I missed a step somewhere?

I’m fine with running up a completely new/ empty cluster, but I can’t find much info on converting an existing namespace other than some hints from Georgios and Herve (ty!) There seems to be some pickiness on what folders can or can’t exist in each mode as well, it’d be nice to have more info on that.

thanks!

gbitzes · May 28, 2018, 8:12am

Hi Crystal,

Indeed, the transition from bulkload to raft mode needs more polishing. These are the steps for now, I’ll add a command to automate some of this:

After bulkload is done, shut down the instance and create a brand new QDB folder using quarkdb-create in a different location, using the nodes you’d like to have in the cluster.
Steal its raft-journal, and move it to the QDB directory of the bulkloaded instance. You can just delete the newly created QDB created directory then.
Manually copy the QDB directory with the data to all nodes you want to have in the cluster.
Make sure the configuration is now in mode “raft”, not “bulkload” for all nodes, and start them all up.

I’m more worried that it took four days… We calculated that even for our largest instances, it shouldn’t take more than a couple of hours. A few questions:

How many files and directories?
What write rates was QDB reporting during bulkload?
Which QDB version?
How capable is the machine on which you’re running QDB? (cores, RAM) Are you running the tool and QDB on the same machine?
How much time is actually spent writing into QDB, and how much time does it spend compacting?
And most importantly: Are you using an SSD? From our tests running the new namespace in a pre-production instance, we found out that an SSD is pretty much mandatory, otherwise performance will become very poor as soon as the total namespace size exceeds the physical RAM available on the machine.

gbitzes · May 28, 2018, 8:19am

Also: What was the resulting size of the bulkloaded state machine?

Cheers,
Georgios

crystal · May 28, 2018, 6:36pm

aha okay so you actually do have to re-create the node entirely, and just move the raft-journal, noted!

Other answers:

~100 million files, ~20 million directories
write rate started at about ~1000Hz, went up to 2000+Hz, just checked and it’s slowed way down to ~500HZ
- it’s currently at about 2 million out of 3 million files processed, on each thread.
- seems faster than the last attempt on the same machine, somehow, I started this ~18h ago, around when I wrote the first post in this thread
qdb version 0.2.3
Intel® Xeon® CPU E5-2620 v4 (8 cores ?), ~64GB ram, both are running on the same machine
- memory use is ~91% by eos_ns_convert
I’m not entirely sure since it ran over 4 days, I didn’t keep track, but these are the last lines it printed out on the last try
- Container init: 20066022 containers in 33464 seconds
- QuarkDB bulkload finalization: 4865 seconds
- Conversion duration: 340077
im guessing this is PROBABLY the issue … if so, that’s an easy enough fix ! I’ll try it out on another server and report back!
will also get back to you on that last question, since i’m reloading the metadata now

gbitzes · May 28, 2018, 7:25pm

Hi Crystal,

Indeed, 1 kHz is tragically low, I just tested on my laptop with the latest version and getting 185 kHz. Even in raft mode I’ve seen up to 100 kHz.

If you’re feeling brave, you could check the rocksdb logs in state-machine/LOG, look for “stalling”, “rate limiting”, or anything that indicates the write rate limiter has kicked in.

Another thing to try would be to run the tool against a full QDB cluster in raft mode. No need to wait for it to finish, the write rate within the first couple of minutes should be indicative enough.

Cheers,
Georgios

gbitzes · May 28, 2018, 7:42pm

One more thing to try: Run “redis-benchmark -p $qdb_port -t set -P 10000 -n 10000000 -c 1 -r 10000000” against a throwaway bulkload QDB node (preferably same machine where you saw 1kHz), what rates do you see? It’s OK if you’d prefer to let the current conversion finish, I can wait.

Cheers,
Georgios

gbitzes · May 28, 2018, 7:50pm

Yet another thing to try: Run

quarkdb-bench --gtest_filter="Benchmark/hset.hset/threads2_events3000000_consensus"

The tool will spin up its own, private QDB cluster on localhost (ports 23456, 23457, 23458, path /tmp/quarkdb-tests) and hammer it with writes.

You may also get some useful information from iotop, while some benchmark is running.

Cheers,
Georgios

crystal · May 29, 2018, 7:12am

update: quite happily, i tried eos_ns_convert on a (previously) production mgm machine (~128GB ram, SSD), and i’m seeing ~150-165kHz, which is pretty great in comparison to the dev server I was using before

Processed files at 143966 Hz
File init: 900 seconds
Commit quota and file system view ...
Quota view successfully commited
Quota init: 0 seconds
--
Container init: 20066022 containers in 115 seconds
QuarkDB bulkload finalization: 1207 seconds
Conversion duration: 2438

rocksdb log

nothing too interesting i can see, there’s this with “stall” in it:

** DB Stats **
Uptime(secs): 86264.9 total, 1219.0 interval
Cumulative writes: 57M writes, 71M keys, 55M commit groups, 1.0 writes per commit group, ingest: 10.93 GB, 0.13 MB/s
Cumulative WAL: 3 writes, 0 syncs, 3.00 writes per sync, written: 0.00 GB, 0.00 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 362K writes, 400K keys, 354K commit groups, 1.0 writes per commit group, ingest: 63.36 MB, 0.05 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent

no mention of rate-limiting anywhere.

I was going to run redis-benchmark and quarkdb-bench on both the prod and dev servers and post the results in one go, after the dev server is done … (2 more days? )

Since the prod server takes a much more reasonable amount of time to do the conversion, I think we’ll be fine - but I can update with the results of the benchmarking if needed

gbitzes · May 29, 2018, 8:08am

Good to hear that. 150 kHz is much more like the rates we’ve seen with the tool, as well.

Still, I cannot explain why you’re getting 1 kHz on the dev server… Even on an ancient desktop CPU with HDD I’ve seen better.

Which RPM version provides eos_ns_convert you’re using? (rpm -qa | grep eos) Some old versions (5+ months) were slow, doing synchronous requests to QDB.

Maybe prod server has a newer version, and dev an older one?

Cheers,
Georgios

crystal · May 29, 2018, 8:51am

nope, they’re using the same docker container - so all versions should be the same (eos-server version is 4.2.22).

it’s slowed down now to 1-250Hz I’ll probably just stop this conversion and retry tomorrow writing to ssd instead, see if that helps much, although its maybe a memory problem ?

I actually tried running eos_ns_convert on a different (much smaller) set of .mdlog files, but it kept crashing on “Container #0 not found” before it even got to processing entries. If you happen to know how to fix that, I’ll see if I can use that instead to test, at the least it shouldn’t take 4 days

gbitzes · May 29, 2018, 9:05am

Yeah, if the machine has started swapping due to low memory, that could explain the rate…

About the crash: Hm, that sounds like a bug in the conversion tool, thanks for mentioning, I’ll take a look.

Cheers,
Georgios

gbitzes · May 30, 2018, 12:51pm

Hi Crystal,

Elvin tells me the namespace has to be compacted right before running the conversion tool, this is likely the reason behind “Container #0 not found.”

Cheers,
Georgios

crystal · June 3, 2018, 10:41pm

sorry for delay!

i ran eos-log-compact on the file and directory metadata for that instance and it completed okay, still getting the container not found issue:

terminate called after throwing an instance of 'eos::MDException'
  what():  Container #0 not found

it may well be an issue on our end (with the metadata), but i’ve got a coredump if you’re interested in checking that out?

gbitzes · June 4, 2018, 8:05am

Hi Crystal,

If the coredump is not too big, sure, feel free to email it to me. Otherwise, a stacktrace should do fine for now.

Cheers,
Georgios

esindril · July 25, 2018, 1:35pm

Hi Crystal,

The issue with the exception being thrown during conversion will be fixed in the EOS 4.3.5 release.
Thanks for reporting it.

Cheers,
Elvin

CERN Accelerating science

QDB namespace conversion process

rocksdb log