Change/repair checksum of disk replica

Hello,

For one of our files, the adler32 checksum of a disk replica is differnt than the one kept in the EOS namespace (which we think is the correct checksum) and as a result the file cannot be written to tape by CTA.

Is there any way to force the EOS namespace checksum on the disk replica? Is eos fsck of any use (dont think, but thought to ask)?

I have run the following commands

[root@cta-adm-fac1 ~]# eos fsck config toggle-collect
[root@cta-adm-fac1 ~]# eos fsck config toggle-repair
[root@cta-adm-fac1 ~]# eos fsck config toggle-best-effort
[root@cta-adm-fac1 ~]# eos fsck config repair-category all
[root@cta-adm-fac1 ~]#

but I dont want to do run any repair in case I cause more damage…

Many thanks,

George

Forgot to say that file sizes of the namespace record and the disk replica match

Hi George,

If the disk checksum does not match the namespace one then it means there was a corruption. The namespace checksum is probably the correct one and you don’t want to modify this one. Also there is no way to force a checksum on the data on disk! It works the other way around, the data should give the correct checksum.

If this is a CTA instance then first of all you probably run with only one replica so there is no point in enabling FSCK as it can not recover any corruption. If a file is broken or has a back checksum there is no other source from where to recover it. So, you should leave the fsck engine disabled. CERN CTA instances never run with FSCK enabled and that is why this was actually having the side effect of removing the tape locations from the files that were properly migrated to tape. I think you reported this in some other channel. We fixed this issue and it will be released in the next version of eos 5.3.35/5.4.3.

In this case your file is just different from the original one so you should declare it lost and re-trigger a copy from the original source.

Cheers,
Elvin

Hi Elvin,

Many thanks for the reply. Yes, I found out, the hard way (..!) that I should have not enabled FSCK but I was desperate to repair the replca. For the record. I reported the accidental deletion of the virtual tape “fsid” in EOS does not point to the correct location of files on tape - #11 by ccaffy - General Discussion - CTA

We are going to declare this file lost as it has it also been deleted from the original source.

As a mitigation against future instances of replica corruption. I am currently testing a 2 replica layout in the default space (on a dev instance) and it looks like it is working as expected. I may raise another thread if I have any questions.

Best,

George

Hi George,

Quick question about this, as I am putting in place a fix, do you know by any chance in which fsck category these files were reported? What is the output of eos fsck stat on this instance.

Thank you,
Elvin

Hi Elvin,

No, I am afraid I cannot remember from the top of my head and I did not run eos fsck stat as all as I was not aware of it. I pasted all MGM log lines that that I thought were relevant in the CTA forum post, cta-community.web.cern.ch/t/eos-does-not-point-to-the-correct-location-of-files-on-tape/428

Hope this helps.

George

Hi George,

Yes, I saw those logs but I would need a bit more. Do you still have the full logs available?
If so please send me everything from 10:47:00 to 10:47:59.

Thank you,
Elvin

Hi Elvin,

Please see all MGM log lines from this time range in

http://www-public.gridpp.rl.ac.uk/tape_accounting/antares-eos15-mgm-14042026.log

Best,

George

Thanks!