Discrepancy in EOS fsck report error counts

Hi all,

I think we just got our fsck working, looking at the numbers from eos fsck report I see some odd differences that I cannot explain. I use this script to produce the numbers, once by summing up per-filesystem statistics (-a flag), once without flag - expecting the same totals:

echo "summed up by filesystem"
ERR_TYPES="blockxs_err orphans_n rep_diff_n rep_missing_n unreg_n"
for ETYPE in $ERR_TYPES; do
  echo -n "$ETYPE: "
  eos fsck report -a | grep $ETYPE  | awk '{print $4;}' | awk 'BEGIN{ FS="="; total=0}; { total=total+$2; } END{print total;}'
done

echo ""
echo "eos fsck summary report"
eos fsck report

I see these numbers on my instance:

summed up by filesystem
blockxs_err: 115
orphans_n: 95082
rep_diff_n: 1251571
rep_missing_n: 24
unreg_n: 1246464
#
eos fsck summary report
timestamp=1613078486 tag="blockxs_err" count=43
timestamp=1613078486 tag="orphans_n" count=29607
timestamp=1613078486 tag="rep_diff_n" count=181915
timestamp=1613078486 tag="rep_missing_n" count=22
timestamp=1613078486 tag="unreg_n" count=180992

I can’t make out the reason why these would differ. The discrepancy looks to be bigger for larger numbers, but I could not come up with an explanation for this myself.

Best,
Erich

Hi Erich,

This is normal since a file can have multiple replicas/stripes therefore if you do the fsck report per file system each replica of the file will show up in the list thus resulting in bigger numbers - so you have the same file reported several time for each replica. On the other hand, when the eos fsck stat is displayed this does not include duplicates inside a category.

Cheers,
Elvin

Hi Elvin,
That explanation makes total sense, hadn’t thought of it.

Thanks,
Erich