Hello,
I am trying different settings for replicas on a test server. I have set attribute of directory /eos/testarea /testmirror to “replica”:
EOS Console [root://localhost] |/> attr ls /eos/testarea/testmirror
sys.forced.blocksize=“4k”
sys.forced.checksum=“adler”
sys.forced.layout=“replica”
sys.forced.nstripes=“2”
sys.forced.space=“default”
… and copied a file there:
EOS Console [root://localhost] |/> file check /eos/testarea/testmirror/DSET-Report-for-nanwn57.in2p3.fr-68T7H32.zip
path="/eos/testarea/testmirror/DSET-Report-for-nanwn57.in2p3.fr-68T7H32.zip" fid=“00000050” size=“3105776” nrep=“2” checksumtype=“adler” checksum=“2729314300000000000000000000000000000000”
nrep=“00” fsid=“1” host=“nanxrd15.in2p3.fr:1095” fstpath="/data01/00000000/00000050" size=“3105776” statsize=“3105776” checksum=“2729314300000000000000000000000000000000”
nrep=“01” fsid=“4” host=“nanxrd17.in2p3.fr:1095” fstpath="/data02/00000000/00000050" size=“3105776” statsize=“3105776” checksum=“2729314300000000000000000000000000000000”
Now if I suppress one of the replicas on disk (trying to simulate a disk failure), I observe that the statsize is changed to 18446744073709551615, which is found by a file check:
EOS Console [root://localhost] |/> file check /eos/testarea/testmirror/DSET-Report-for-nanwn57.in2p3.fr-68T7H32.zip %output
path="/eos/testarea/testmirror/DSET-Report-for-nanwn57.in2p3.fr-68T7H32.zip" fid=“00000050” size=“3105776” nrep=“2” checksumtype=“adler” checksum=“2729314300000000000000000000000000000000”
nrep=“00” fsid=“1” host=“nanxrd15.in2p3.fr:1095” fstpath="/data01/00000000/00000050" size=“3105776” statsize=“18446744073709551615” checksum=“2729314300000000000000000000000000000000”
nrep=“01” fsid=“4” host=“nanxrd17.in2p3.fr:1095” fstpath="/data02/00000000/00000050" size=“3105776” statsize=“3105776” checksum=“2729314300000000000000000000000000000000”
INCONSISTENCY STATFAILED path=/eos/testarea/testmirror/DSET-Report-for-nanwn57.in2p3.fr-68T7H32.zip fid=00000050 size=3105776 stripes=2 nrep=2 nrepstored=2 nreponline=2 checksumtype=adler checksum=2729314300000000000000000000000000000000
but not with fsck:
EOS Console [root://localhost] |/> fsck stat
181219 09:32:10 1545208330.686749 started check
181219 09:32:10 1545208330.686804 Filesystems to check: 6
181219 09:32:20 1545208340.690951 d_mem_sz_diff : 1 (1)
181219 09:32:20 1545208340.690977 rep_diff_n : 0 (0)
181219 09:32:20 1545208340.690990 rep_offline : 0 (0)
181219 09:32:20 1545208340.691010 stopping check
181219 09:32:20 1545208340.691020 => next run in 30 minutes
And a fsck repair does nothing though the absence of the replica is mentioned in the FST log:
81219 09:33:24 18118 FstOfs_stat: root.29897:59@nanxrd15 Unable to stat file /data01/00000000/00000050; no such file or directory
181219 09:33:37 13982 FstOfs_stat: root.29897:59@nanxrd15 Unable to stat file /data01/00000000/00000050; no such file or directory
=> How is this supposed to work ? And how to test it ?
Thank you
JM