Dumpmd - too many entries

peby · June 12, 2025, 2:53pm

Hi all,

Running dumpmd to provide ALICE a list of files to redistribute due to a backing fsid failure (single disk) but getting the following.

How to proceed? Is there a QDB query method?

[root@ornl-eos-01]-diopside-~# eos fs dumpmd 12068 --path > fsid-12068-paths.txt
error: too many entries (>100k) on file system to dump them all

Thank you,
Pete

asevcenc · June 12, 2025, 7:55pm

Hi Pete! I had recently the same issue and the solution was to dump the entire namespace and filter it ..
the dump was done with:

eos-ns-inspect scan --path / --no-dirs --members my_qdb_fqdn:qdb_port --password-file /etc/quarkdb.pass --json > full_info.json

and then i filtered it with:


cat filter_eosdump.py 
#!/usr/bin/env python3

import os
import sys
import json
import ijson

fileName = 'full_info.json'

with open(fileName, mode='r') as f:
    for record in ijson.items(f, "item"):
        if '/eos/alice/grid' in record['path']:
            print(f'{record["name"]},{record["size"]}')

HTH,
Adrian

peby · June 13, 2025, 7:38pm

Hi @asevcenc, thank you for that solution, that should work.

Too bad eos fs dumpmd is failing as the full ns dump is 20G and takes a while. I don’t see a way to scope eos-ns-inspect to a specific field, so seems one has to parse after as you suggested.

I installed version to match existing eos-server with dnf install eos-ns-inspect-$(rpm -q --queryformat %{VERSION} eos-server) (to avoid unscheduled update of other eos-*)

It appears the “locations” contains the fsid number, so I modified and sharing in case helpful to others.

#!/usr/bin/env python3

import os
import sys
import json
import ijson

fileName = 'eos-ns-inspect_paths_sample.json'

with open(fileName, mode='r') as f:
    for record in ijson.items(f, "item"):
        #if '120' in record['locations']: # the fsid number or substring thereof 
        if '12048' in record['locations'] or '12068' in record['locations']: # multiple fsids
            print(f'{record["name"]},{record["size"]}')

cheers,
Pete

peby · June 16, 2025, 1:11pm

@asevcenc @Costin_Grigoras @esindril

Curious that after filtering the ns dump on record[‘locations’] as above for a specific fsid there is quite a large discrepancy between what the QDB namespace dump produced (which essentially matches eos fs dumpmd --count) vs what eos fs status reports for stat.usedfiles as below.

A random sampling across fsids shows similar high variation, though not as significant as the one below: eos fs dumpmd $fsid --count && eos fs status $fsid | grep usedfiles

Is such expected? Is stat.usedfiles not in fact file entities on an fisd, or perhaps only initiall populated when the fsid boots? I’m not finding info on the forum which clarifies stat.usedfiles or correlates it to eos fs dumpmd --count

The 20G dump was of the full namespace was produced with:

eos-ns-inspect scan --path / --no-dirs --members ornl-eos-01.ornl.gov:7001 --password-file /etc/eos.keytab --json > /data/eos-inspect-ns-dumps/eos-ns-inspect_paths.json

eos fs status vs ns entries for two fsids:

ns dump produced 14% of what fs status reports:

wc -l fsid-12048-locations.txt
9596 fsid-12048-locations.txt

eos fs dumpmd 12048 --count
num_files=9590

Though... 

eos fs status 12048 | grep usedfiles
stat.usedfiles                   := 65720

Also, do a far less degree

wc -l fsid-12068-locations.txt
286720 fsid-12068-locations.txt

eos fs dumpmd 12068 --count
num_files=286542

Though...

eos fs status 12068 | grep usedfiles
stat.usedfiles                   := 305069

esindril · June 17, 2025, 6:30am

Hi Pete,

The fs status lists all the files on the mountpoint, irrespective if they correspond to a namespace entry or not - basically all the inodes on that filesystem and corresponds to the statfs->f_files item. This might include for example, orphan files, block checksum files which are usually attached the a main file depending on the layout, scrub files used to identify broken disks etc.

Therefore, in general, I would expect that the fs status shows more files then the eos fs dumpmd. This should not be a concern.

Cheers,
Elvin

peby · July 15, 2025, 2:30pm

@esindril thank you for the above clarification.

Regarding

[root@ornl-eos-01]-diopside-~# eos fs dumpmd 12068 --path > fsid-12068-paths.txt
error: too many entries (>100k) on file system to dump them all

Is there any workaround for this using eos fs dumpmp or must one now resort to the above solutions, to query quarkdb and manually parse the full dump as above?

Thank you,
Pet

esindril · July 15, 2025, 3:23pm

Hi Pete,

Indeed, if you have more than 100k files the tool will not display the entries. This is a protection put in place since this operation is heavy in terms of namespace load and can lead to slow namespace operations. Therefore, we decided to limit the output to a reasonable amount and this is mainly used when looking at the left over files following a drain operation.

If you need a tool to report the files that are stored at your side then the correct solution is the eos-ns-inspect tool which can run in the background and not clog the MGM performance. If there is any particular information that you would like eos-ns-inspect to provide, just let us know and we can consider adding this as extra functionality.

Cheers,
Elvin

peby · July 18, 2025, 3:02pm

Hi Elvin,

Could eos-ns-inspect scan include the ability to nativy filter on something like an explicit fsid number or other elements? Such would I think in effect replicate the eos fs dumpmd when the result set is >100k

The ability to do so directly with eos-ns-inspect rather then a multistep process manually would be welcome I think.

Alternatively, if the overhead of eos fs dumpmd is too impactful on the mgm for large results, could it still be able to overide the 100k limit with a suggestion to limit the impact via priority, cgroups, etc. (Perhaps not awesome, I realize.)

Thank you,
Pete

esindril · July 30, 2025, 2:52pm

Hi Pete,

In the end, you can easily get the same result with a bit of scripting. For example, with a bit of care, this command will print all the file paths that are stored on a particular file system:

FSID=<some_val> eos-ns-inspect scan-files --full-paths --json --members localhost:7777 | jq '.[] | select(.locations | contains("${FSID}")) | .path'

Throttling is not an option for the namespace ops, in general we want to be as fast as possible, when the request is reasonable.

Cheers,
Elvin

CERN Accelerating science

Dumpmd - too many entries