Modifying files with one replica gives read-only file system

franck-jrc · February 9, 2024, 5:50pm

Dear fellows,

We have this situation on some files that return the error read-only file system when the users try to modify them.

We noticed that all these files miss a replica. The strange thing is that for many of these files, the unique replica is in fact valid, and matches the metadata in the MGM.

Indeed doing such a procedure on such a file gives the following:

$ echo test | tee file.txt
test
tee: file.txt: Read-only file system
# eos file adjustreplica test.txt
success: scheduled replication from source fs=3163 => target fs=3103
$ echo test | tee file.txt
test
$ #OK

We were wondering if such files that just misses one replica isn’t supposed to be repaired by the fsck repair thread (which is now enabled on our instance) ? The files that are concerned are not newly created files, and indeed one of them was blocking a process for several days because it would always block on that file, this is how we noticed this issue.
And in addition, since the file is perfectly readable because the replica is healthy, we were thinking that modifying the file could well update just the existing replica and go on.

This doesn’t concern many files (only 16 different ones since the beggining of the year, with a total of 79 operations according to the logs) but is blocking for people using one file. But we potentially have other files with one replcia that would fail being modified if we would try to do that.

Did someone had also such a case in his instance ?

esindril · March 20, 2024, 12:47pm

Hi Franck,

Thanks for the report. This indeed is the current behavior and can be easily reproduced by dropping a replica of a 2-replica file and then trying to update it via the fuse client. Fuse can properly repair in place a file with one replica of the 2 unavailable but the way the code is currently done, in the scheduling part there need to be two replicas attached to the file for this to work.

Normally, such files should be relatively quickly picked up by fsck and fixed since the attached replica is apparently available and in the correct state. Do you have fsck running on this instance?

Then it would be interesting to understand why the file ended up with only one replica in the first place. Do you have any logs from the FST side when the file was initially written? Maybe also from the fuse daemon that was doing the writing?

Thanks,
Elvin

franck-jrc · March 21, 2024, 11:35am

Dear Elvin,

Thank you for having had a look and for your answer. OK for the fact that the Read-Only filesystem error is normal, but indeed strange that the file is not repaired by fsck. Both collect and repair are enabled on our instance (though they are currently busy in trying repairing our leftover files which are not possible to be fixed and we still have to process and decide what to do with them !)

It appears that the same thing just happened these day on a file. The good news is that this use case is easy, because only one host accesses the file, on a daily basis on a scheduled cron job. What the cron does (it is some user’s code, in python, using standard libraries I’d say) is simply downloading some simple CSV data and process them. The CSV files are open in update, i.e. the file id never changes, but rewritten fully with the new downloaded data. And this is at that download step that our case occurs.

I have gathered all possible logs for this case, and I can send them to you if you need them full.

As as summary, the update of this file of 2 days ago caused a synchronisation error on one FST replication for fxid=9c4f645b resulted in a different file size on fsid=1000 - 1417514 vs 16384 - rejecting replica. The fuse client also has log corresponding to this file at that time :

240319 10:22:48 t=1710840168.104114 f=HandleResponse   l=CRIT  tid=00007ff455ff6700 s=xrdclproxy:914           write error '[ERROR] Error response: no such device'
240319 10:22:48 t=1710840168.214246 f=TryRecovery      l=CRIT  ino:800000009c4f645b s=data:1103                triggering write recovery state = 3
240319 10:22:48 t=1710840168.220821 f=try_ropen        l=WARN  ino:800000009c4f645b s=data:1339                recover read-open succesfull
240319 10:22:48 t=1710840168.220843 f=recover_write    l=CRIT  ino:800000009c4f645b s=data:2011                no local cache data for recovery
240319 10:22:48 t=1710840168.221020 f=ioflush          l=WARN  tid=00007ff464bf3700 s=data:3323                re-issuing OpenAsync request after timeout - ino:800000009c4f6477 err-code:206
240319 10:22:48 t=1710840168.349872 f=ioflush          l=WARN  tid=00007ff464bf3700 s=data:3334                OpenAsync failed - trying recovery - ino:800000009c4f645b err-code:0
240319 10:22:48 t=1710840168.349890 f=ioflush          l=WARN  tid=00007ff464bf3700 s=data:3360                giving up OpenAsync request - ino:800000009c4f645b err-code:0
240319 10:22:48 t=1710840168.350058 f=ioflush          l=CRIT  tid=00007ff464bf3700 s=data:3389                ino:800000009c4f645b msg=file writing failed file-recovery= journal-recovery=/var/cache/eos/fusex/cache/jeodpp/009/800000009C4F645B.jc.recover
#      -------------------
#      - recovery record -
#      -------------------
#        path := '/eos/jeodpp/.../..._timeseries.csv'
#        fid  := 2622448731
#        -[ 000 ] 240319 10:22:48 t=1710840168.214278 f=recover_write    l= ino:800000009c4f645b s=data:1671                hint='recover write'
#        -[ 001 ] 240319 10:22:48 t=1710840168.214297 f=recover_write    l= ino:800000009c4f645b s=data:1728                hint='recover from remote file'
#        -[ 002 ] 240319 10:22:48 t=1710840168.214430 f=try_ropen        l= ino:800000009c4f645b s=data:1325                hint='try read-open'
#        -[ 003 ] 240319 10:22:48 t=1710840168.220857 f=recover_write    l= ino:800000009c4f645b s=data:2016                hint='no local cache data for recovery'

So the file was left with only one replica. Note that the size 1417514 is the current one of the existing file with only one replica. The file itself is valid and seems complete. The metadata match between the FST and the MGM, and a simple eos file adjustreplica command fixes it.

The next run of the process (yesterday and today) failed when treating this file with this error Read-Only file system. However, the modification timestamp was of today. Also some log is available on the client :

240321 10:22:18 t=1711012938.179949 f=HandleResponseWithHosts l=ERROR tid=00007ff457ffa700 s=xrdclproxy:559           state=failed async open returned errmsg=[ERROR] Error response: read-only file system
240321 10:22:18 t=1711012938.180113 f=flush_nolock     l=ERROR ino:800000009c4f645b s=data:363                 file not open
240321 10:22:18 t=1711012938.180155 f=flush_nolock     l=ERROR ino:800000009c4f645b s=data:373                 write error error=[ERROR] Server responded with an error: [3025] Unable to access quota space /eos/jeodpp/data/projects/ARCTIC/crottil/ArcticPS/automatic_procedure/EEA/Data_meteo/SE_5_31124_2022_timeseries.csv; Read-only file system
240321 10:22:18 t=1711012938.180179 f=TryRecovery      l=ERROR ino:800000009c4f645b s=data:1090                unrecoverable error - code=400 errNo=3025
240321 10:22:18 t=1711012938.180197 f=flush_nolock     l=ERROR ino:800000009c4f645b s=data:405                 journal-flushing recovery failed rc=30
240321 10:22:18 t=1711012938.181996 f=pwrite           l=ERROR ino:800000009c4f645b s=data:2356                unrecoverable error - code=400 errNo=3025
240321 10:22:18 t=1711012938.182053 f=write            l=ERROR tid=00007ff4263ff700 s=eosfuse:5015             io-error: inode=-9223372034232327077 size=8192 off=0 buf=140687089623120 errno=30
240321 10:22:18 t=1711012938.278968 f=ioflush          l=WARN  tid=00007ff464bf3700 s=data:3334                OpenAsync failed - trying recovery - ino:800000009c4f645b err-code:400
240321 10:22:18 t=1711012938.278987 f=ioflush          l=WARN  tid=00007ff464bf3700 s=data:3360                giving up OpenAsync request - ino:800000009c4f645b err-code:400
240321 10:22:18 t=1711012938.279223 f=ioflush          l=CRIT  tid=00007ff464bf3700 s=data:3389                ino:800000009c4f645b msg=file open failed file-recovery= journal-recovery=/var/cache/eos/fusex/cache/jeodpp/009/800000009C4F645B.jc.recover
#      -------------------
#      - recovery record -
#      -------------------
#        path := '/eos/jeodpp/.../..._timeseries.csv'
#        fid  := 2622448731
#        -[ 000 ] 240321 10:22:18 t=1711012938.180139 f=flush_nolock     l= ino:800000009c4f645b s=data:371                 status='[ERROR] Error response: read-only file system' hint='will TryRecovery'
#        -[ 001 ] 240321 10:22:18 t=1711012938.180190 f=flush_nolock     l= ino:800000009c4f645b s=data:404                 errno='30' hint='failed TryRecovery'

But the file id doesn’t appear in no fsck category. Maybe the reason is that we have a scaninterval which is high and won’t trigger within one day ? But I understood that if a file is modified this resets the time and it should be checked at next pass (which occurs every 2 hours now)

Versions : MGM 5.1.22 FST 5.1.20 EOSX 5.1.30

CERN Accelerating science

Modifying files with one replica gives read-only file system