HW requirements for migration to new QuarkDB namespace

Dear all,

We are preparing the infrastructure to set up a QuarkDB cluster to migrate to the new namespace, and we would like some piece of advice about what to use for hardware.

First, some figures from our production instance :

# ------------------------------------------------------------------------------------
# Namespace Statistics
# ------------------------------------------------------------------------------------
ALL      Files                            537803743 [booted] (4730s)
ALL      Directories                      150997435
[...]
# ------------------------------------------------------------------------------------
ALL      memory virtual                   883.75 GB
ALL      memory resident                  820.07 GB
ALL      memory share                     49.54 MB
ALL      memory growths                   7.00 GB
ALL      threads                          405
ALL      fds                              1016

We are considering 3 hosts with 800GB SSD disks (for QuarkDB data), 640 GB RAM and 40 cores, where we would install the 3 QuarkDB nodes, and the MGM (1 master, 2 slaves).

Does this sound reasonable ?

Do you have some estimation of the average necessary disk space to store one metadata object (file or directory) ?
In particular, we would like some confirmation that the 800GB disk space is sufficient for our current 540M files, 150M folders, which will grow in the next years, probably more than double in quantity, so let’s say up to 2G files.
Also, the RAM should not be a constraint limit now that not all the namespace will be in memory, but do you think that this 640GB amount would allow reasonable performance for the MGM for this amount of objects, to be shared with QuarkDB memory since they would run along ?

In addition, we could have the possibility to upgrade the SSD disks to NVMe type, which are faster and more long-term reliable for many write access. Would you retain this option as valid or not worth it ?

And last one, are there other aspect of the server not mentioned here that we should also take into account ?

Hi Franck,

Yes, the machines seem more than capable:

  • 640 GB RAM should be plenty both for the MGM, and QuarkDB. The MGM now consumes a configurable amount of memory, depending on the size of the cache - 640 GB should allow for a large cache. QuarkDB direct memory consumption is low (7 GB for one of our instances with 500M files), but benefits from the machine having a large RAM as the kernel will automatically keep SST file contents in the page cache.
  • In one of our instances with 500M files, QDB occupies 106 GB of disk space. This works out to around ~212 bytes per file - note this depends on the compression ratio QDB achieves, in this case ~3.2, but you can expect something similar. So yes, 800GB SSD sounds quite sufficient even for 2B files.
  • 40 cores: Yes, more than enough.

Most likely will not make a difference in user-perceived performance - anything above 100k IOPS, which can be achieved with a decent SSD, should be sufficient for QDB. Given that the MGM is caching frequently accessed metadata, I/O rates will not be that high. NVMe might still make sense if the difference in cost is not large, up to you to decide.

Everything looks good to me. :+1:

Thank you for your answer Giorgos!

Hi @gbitzes, one small question about running both MGM and QuarkDB on the same node :
After having testing a namespace migration, and launched QuarkDB, it appears that MGM cannot start, because QuarkDB is already listening on port 1094. It is strange, because this isn’t the case on the test instance. We can indeed see this line in the log :

# grep :1094 /var/log/xrootd/quarkdb/xrootd.log
Config Route all4: s-jrciprjeop216p.cidsn.jrc.it Dest=[::139.191.240.214]:1094
------ xrootd quarkdb@s-jrciprjeop216p.cidsn.jrc.it:1094 initialization completed.

# netstat -nlpt | grep xrootd
tcp6       0      0 :::1094                 :::*                    LISTEN      40389/xrootd        
tcp6       0      0 :::7777                 :::*                    LISTEN      40389/xrootd   

What could be the reason for this ?

Hi @franck-jrc,

Try adding xrd.port <same port as in xrd.protocol redis:xyz> to the QDB configuration. That is, if you have xrd.protocol redis:7777, add xrd.port 7777 as well.

Cheers,
Georgios

Thank very much, Giorgos, that was exactly that!