cat filter_eosdump.py
#!/usr/bin/env python3
import os
import sys
import json
import ijson
fileName = 'full_info.json'
with open(fileName, mode='r') as f:
for record in ijson.items(f, "item"):
if '/eos/alice/grid' in record['path']:
print(f'{record["name"]},{record["size"]}')
Hi @asevcenc, thank you for that solution, that should work.
Too bad eos fs dumpmd is failing as the full ns dump is 20G and takes a while. I don’t see a way to scope eos-ns-inspect to a specific field, so seems one has to parse after as you suggested.
I installed version to match existing eos-server with dnf install eos-ns-inspect-$(rpm -q --queryformat %{VERSION} eos-server) (to avoid unscheduled update of other eos-*)
It appears the “locations” contains the fsid number, so I modified and sharing in case helpful to others.
#!/usr/bin/env python3
import os
import sys
import json
import ijson
fileName = 'eos-ns-inspect_paths_sample.json'
with open(fileName, mode='r') as f:
for record in ijson.items(f, "item"):
#if '120' in record['locations']: # the fsid number or substring thereof
if '12048' in record['locations'] or '12068' in record['locations']: # multiple fsids
print(f'{record["name"]},{record["size"]}')
Curious that after filtering the ns dump on record[‘locations’] as above for a specific fsid there is quite a large discrepancy between what the QDB namespace dump produced (which essentially matches eos fs dumpmd --count) vs what eos fs status reports for stat.usedfiles as below.
A random sampling across fsids shows similar high variation, though not as significant as the one below: eos fs dumpmd $fsid --count && eos fs status $fsid | grep usedfiles
Is such expected? Is stat.usedfiles not in fact file entities on an fisd, or perhaps only initiall populated when the fsid boots? I’m not finding info on the forum which clarifies stat.usedfiles or correlates it to eos fs dumpmd --count
The 20G dump was of the full namespace was produced with:
The fs status lists all the files on the mountpoint, irrespective if they correspond to a namespace entry or not - basically all the inodes on that filesystem and corresponds to the statfs->f_files item. This might include for example, orphan files, block checksum files which are usually attached the a main file depending on the layout, scrub files used to identify broken disks etc.
Therefore, in general, I would expect that the fs status shows more files then the eos fs dumpmd. This should not be a concern.