Following the discussion about eos-ns-inspect command at last EOS Workshop, I wanted to summarize what we obtained while running it on out large instance at JRC (800M files, 170M directories). This commands helps in inspecting a EOS namespace without going through the MGM. IT is available in a separate yum package eos-ns-inspect.
In order to not bother the production at all (although this shouldn’t be a risk), the run was done on a haddock one node cluster running from a restored copy of a previous backup, like explained in the QDB documentation .
We ran all of the available check commands :
check-naming-conflicts Scan through the entire namespace looking for naming conflicts
check-cursed-names Scan through the namespace to find files / containers with invalid names
stripediff Find files which have non-nominal number of stripes (replicas)
one-replica-layout Find all files whose layout asks for a single replica
check-orphans Find files and directories with invalid parents
check-fsview-missing Check which FileMDs have locations / unlinked locations not present in the filesystem view
check-fsview-extra Check whether there exist FsView entries without a corresponding FMD location
check-shadow-directories Check for naming conflicts between directories inside the same subdirectory
check-simulated-hardlinks Check for corruption in simulated hardlinks
Some of these commands are making only few reads (1-2) per seconds (stripediff, one-replica-layout, etc…) and finish in few hours, but others are more intrusive are make a lot of reads 20-40KHz (check-fsview-extra, check-fsview-missing, check-orphans) and complete up to 1 or 2 days on our instance.
By analyzing the output, it seems that nothing really critical occurred, however, it appears many things due to some left over items from deleted files, that come difficult to clean, because the corresponding files are not existing any more in the namespace. We were wondering if this could be cleaned, and how.
There are also many files with unlinked locations in check-fsview-missing. Some we caused to some issue with the FSTs some year ago, and others remain after a FS that was defect, drained, then reinserted. Wonder if these could also be cleaned by removing the reference to unexisting replicas.
Ex :
# eos file info fid:343061649
File: 'fid:343061649' Flags: 0644 Clock: 15f42e7bcf0c794b
Size: 407
Modify: Thu Feb 4 03:29:58 2016 Timestamp: 1454552998.0
Change: Wed Mar 22 16:57:57 2017 Timestamp: 1490198277.689263039
Birth : Thu Jan 1 01:00:00 1970 Timestamp: 0.0
CUid: 47000 CGid: 40500 Fxid: 1472b491 Fid: 343061649 Pid: 0 Pxid: 00000000
XStype: adler XS: 10 41 7e 6d ETAGs: "92089910185426944:10417e6d"
Layout: replica Stripes: 2 Blocksize: 4k LayoutId: 00100112
#Rep: 0
(undeleted) $ 549
*******
error: cannot retrieve file meta data - Container #0 not found (errc=0) (Success)
Some similar situation in stripediff test, with files with no missing location, no path ans no replicas :
In naming-conflicts, many are reported have the same name, and parent-id=0, but few also have the same name in the same folder, I suspect they could have been created at the very same time by 2 different nodes :
Detected conflict for 'auth.pyc' in container 210612997, betewen files 1236222806 and 1236222808
Detected conflict for 'auth.pyc' in container 210612997, betewen files 1236222808 and 1236819888
Nicely done, you’re quite patient / curious for running all check commands. A note that check-fsview-* are rather experimental, I should probably mark them as such, I implemented them a while ago but not sure how well they work.
Indeed, leftover invisible files with parent=0 seem to be a common issue in EOS, even with the old namespace. When a user issues rm, the underlying data on the FST are not deleted synchronously, but are scheduled for deletion in the future, while the file is marked with parent=0 thus making it invisible to end-users, even though it’s still part of the namespace. EOS has had issues with this mechanism, with many files persisting in this state forever.
I discovered this while playing with ns-inspect too. We’re working on a solution for how to properly and permanently clear such files.
I believe these have been generated by early versions of FUSEX – bugs in the server would sometimes create such conflicting entries. These might only pose a problem during draining, we’re working on providing a user-friendly way of clearing them up.
Unfortunately such inconsistencies can happen due to the way the internal namespace API is used – we had to use the same API as the in-memory namespace, which does not allow for transactional operations. Once we drop the in-memory namespace and I can improve the API, such issues should go away completely.
Thank you for your explanations. OK, so you confirm that these situations shouldn’t really hurt, so for now we live with that until there are way to clean them.