FSCK reports and how to deal with them

apeters · May 16, 2018, 8:28am

I did some testing. The orphans probably appear after a ‘fs boot --syncmgm’ because the timeout for retrieving the MGM meta data stream is 10 minutes. Can it be that you have many millions of files on a single filesystem?

The logic behind works like the following on a full sync:
clear the db
put all the files on disk in the DB and flag all as orphan
flag all the files reported from the MGM as not orphans

When the MGM syncing part times out you get a lot of files as fake orphans.

I also noticed, that the process in the FST is not correctly spawned, it uses ‘system’ which is very bad in a multi-threaded program because it uses fork and can give an OOM. We have to fix that.

We should probably also make this timeout reconfigurable or have a higher default.

franck-jrc · May 16, 2018, 10:09am

Hi Andreas,

Not really. Each FS has as average 1M files, oscillating roughly between 500K and 1.5M. From the last fsck export (before we disabled it as mentioned above) almost all FS that report orphans file report more than 600K, so it could be that we reach the timeout…

But we also observe such report on our test instance, currently on one FS out of 9, which reports only 1K files, and repair --resync does not clear them… nor does regular scan on FST (test instance scans them daily…)

apeters · May 16, 2018, 12:04pm

Can you briefly check, maybe the 1K are actually orphans?

franck-jrc · May 16, 2018, 2:04pm

They were not actual orphans (there was a path for them, and there was a replica entry for the FS reporting them as orphans), but I just rechecked and the resync action did eventually clear them, although not immediately. So on this small instance, orphans number dropped to 29, and they are actual orphans.

Regarding the production instance, during the last maintenance period, we launched a resync on the around 10M files reported in the fsck report. This took around 1 hour - and in the mean time fsck interface was not responding. But since we also restarted all the FSTs for upgrade, the number of files eventually grew again to a larger number, and we couldn’t tell if the previous ones were cleared or no. Now we have even more files.

esindril · May 25, 2018, 1:49pm

Hi Franck,

If you have the possibility can you test the version produced by this pipeline:

And let me know if this helps with your issue? I’ve done some modifications to the scanner and it should properly mark and move orphans.

Cheers,
Elvin

esindril · May 25, 2018, 2:32pm

Hi Franck,

One more thought on this. I picked up one fs which had like 42k orphans our of 51k files and I’ve done the following things:

initial status:

eos fs status 3870 | grep orphans
stat.fsck.orphans_n              := 43262
eos fs ls 3870 --io
┌────────────────────────────────┬──────┬────────────────┬───────────────────┬──────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────┬──────┬────────────┬────────────┬────────────┬───────────┬──────────┬──────────────┬────────────┬──────┬─────────┐
│hostport                        │    id│      schedgroup│             geotag│  diskload│  diskr-MB/s│  diskw-MB/s│ eth-MiB/s│  ethi-MiB│  etho-MiB│ ropen│ wopen│  used-bytes│   max-bytes│  used-files│  max-files│   bal-shd│     drain-shd│   drainpull│  iops│       bw│
└────────────────────────────────┴──────┴────────────────┴───────────────────┴──────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────┴──────┴────────────┴────────────┴────────────┴───────────┴──────────┴──────────────┴────────────┴──────┴─────────┘
 p05496644h29153.cern.ch:1095       3870       default.28 9918::R::0002::WL02       0.01         0.75         0.01       1192          3        176      2      0      3.39 TB      4.00 TB      51.23 K    195.35 M          0              0          off     59    151 MB

then did a boot of that fs with the --resync flag on:

eos fs boot 3870 --syncmgm 
success: boot message send to p05496644h29153.cern.ch:/data19

and monitored the status on the FST:

 tail -f /var/log/eos/fst/xrdlog.fst | grep  "FmdDbMap"
180525 16:21:44 time=1527258104.459059 func=ResyncAllDisk            level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1054                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=10000 fsid=3870
180525 16:21:55 time=1527258115.712853 func=ResyncAllDisk            level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1054                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=20000 fsid=3870
180525 16:22:02 time=1527258122.234257 func=ResyncAllDisk            level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1054                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=30000 fsid=3870
180525 16:22:07 time=1527258127.312786 func=ResyncAllDisk            level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1054                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=40000 fsid=3870
180525 16:22:10 time=1527258130.257063 func=ResyncAllDisk            level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1054                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=50000 fsid=3870
180525 16:22:23 time=1527258143.487375 func=ResyncAllMgm             level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1239                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=10000 fsid=3870
180525 16:22:24 time=1527258144.255685 func=ResyncAllMgm             level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1239                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=20000 fsid=3870
180525 16:22:24 time=1527258144.937825 func=ResyncAllMgm             level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1239                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=30000 fsid=3870
180525 16:22:25 time=1527258145.667004 func=ResyncAllMgm             level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1239                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=40000 fsid=3870
180525 16:22:26 time=1527258146.376248 func=ResyncAllMgm             level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:1239                  tident=<service> sec=      uid=0 gid=0 name= geo="" msg="synced files so far" nfiles=50000 fsid=3870
180525 16:22:26 time=1527258146.484211 func=MarkCleanDB              level=INFO  logid=CommonFmdDbMapHandler unit=fst@p05496644h29153.cern.ch:1095 tid=00007f105b7fb700 source=FmdDbMap:488                   tident=<service> sec=      uid=0 gid=0 name= geo="" LevelDB DB mark clean for fsid=3870

After this was finished the reporting of orphans was “more reasonable”:

eos fs status 3870 | grep orphans
stat.fsck.orphans_n              := 12

Bottom line, this might point to the scenario that Andreas suggested. There is now (already in 4.2.23) a timeout of 4 hours for the dumpmd command so this should go away. Together with the recent fixes such orphans should be handled properly.

Cheers,
Elvin

franck-jrc · May 28, 2018, 8:29am

Hi Elvin,

Thank you for spending some time on this.

What is the way to use this specific pipeline version ? Is there some package repository for it ? Unfortunately, I don’t know if I will be able to test this not official release on our production instance, which present the issue…

About the resync, in fact I realize that a FS takes much more time than yours to boot, between 30 minutes and one hour, and this happens sometimes when a FST starts, i.e. all FS boot at the same time, that slows down things for sure. So maybe if we boot them one y one sequentially, that might improves things. But still, I might want to wait we upgrade to V 4.2.23, because the previous 10 minutes timeout will for sure be exceeded.

esindril · May 28, 2018, 8:46am

Hi Franck,

The tests I’ve done were using 4.2.22. Therefore, I think you can go ahead and test it for one file system on one of the FSTs without doing any update. You don’t need any more the new version from the pipeline. Just follow the steps from my last post.

We’ve removed this full resync operation and this should not happen in the future, but this will only be available in 4.2.24.

Cheers,
Elvin

franck-jrc · May 28, 2018, 9:55am

OK, thank you for these precision, I will try to run boot on some FS.

Just one question :

Will this orphan value change even if we don’t have fsck enabled ?

esindril · May 29, 2018, 8:37am

Yes, these statistics are collected all the time.

franck-jrc · June 20, 2018, 6:53am

I confirm that running eos fs boot <fsid> --syncmgm on the file system one by one, the number of reported orphans does reset to a small value.
We are wondering if doing such operation on a fs which reports non-zero ropen or wopen values has impact on the client accessing the files ? Or would that be transparent to the users ?

esindril · June 20, 2018, 7:59am

Hi Franck,

All updates to the local database are protected by locks so the only interference that I can think of is that there is a delay in replying to the users, and if this takes more than 1 min they will timeout.

Cheers,
Elvin

CERN Accelerating science

FSCK reports and how to deal with them