Can not fsck repair for RAIN files

Hello, everyone.

After the EOS 5.3 version upgrade, when I repair RAIN files with fsck repair, it doesn’t seem to do the conversion.

Even for files with all stripes with nrep=16, when I run fsck repair, it doesn’t show as normal or successful, it shows as failed.

Even when I tried fsck repair on a file with 15 stripes, sometimes it would say it was successful, and I would check the file and see that the stripe numbers were the same, but the old stripes had been moved to the new FST.

I was wondering if you could help me with this?

EOS Console [root://localhost] |/eos/gsdc/proc/conversion/f5/> file check fxid:0088e14b                                                                                                                           
path="/eos/gsdc/grid/03/25446/ad59e1c1-db70-11ec-b821-3cecef03e998" fxid="0088e14b" size="1967736" nrep="15" checksumtype="adler" checksum="ace9596000000000000000000000000000000000000000000000000000000000"     
nrep="00" fsid="1357" host="jbod-mgmt-09.sdfarm.kr:1095" fstpath="/jbod/box_17_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                   
nrep="01" fsid="517" host="jbod-mgmt-04.sdfarm.kr:1095" fstpath="/jbod/box_07_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                    
nrep="02" fsid="13" host="jbod-mgmt-01.sdfarm.kr:1095" fstpath="/jbod/box_01_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                     
nrep="03" fsid="601" host="jbod-mgmt-04.sdfarm.kr:1096" fstpath="/jbod/box_08_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                    
nrep="04" fsid="349" host="jbod-mgmt-03.sdfarm.kr:1095" fstpath="/jbod/box_05_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                    
nrep="05" fsid="853" host="jbod-mgmt-06.sdfarm.kr:1095" fstpath="/jbod/box_11_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                    
nrep="06" fsid="1189" host="jbod-mgmt-08.sdfarm.kr:1095" fstpath="/jbod/box_15_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                   
nrep="07" fsid="1021" host="jbod-mgmt-07.sdfarm.kr:1095" fstpath="/jbod/box_13_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                   
nrep="08" fsid="769" host="jbod-mgmt-05.sdfarm.kr:1096" fstpath="/jbod/box_10_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                    
nrep="09" fsid="97" host="jbod-mgmt-01.sdfarm.kr:1096" fstpath="/jbod/box_02_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                     
nrep="10" fsid="1273" host="jbod-mgmt-08.sdfarm.kr:1096" fstpath="/jbod/box_16_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                   
nrep="11" fsid="685" host="jbod-mgmt-05.sdfarm.kr:1095" fstpath="/jbod/box_09_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                    
nrep="12" fsid="22012" host="jbod-mgmt-11.sdfarm.kr:1096" fstpath="/jbod/box_22_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                  
nrep="13" fsid="265" host="jbod-mgmt-02.sdfarm.kr:1096" fstpath="/jbod/box_04_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"                                                    
nrep="14" fsid="433" host="jbod-mgmt-03.sdfarm.kr:1096" fstpath="/jbod/box_06_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"

EOS Console [root://localhost] |/eos/gsdc/proc/conversion/f5/> fsck repair --fxid 0088e14b                
msg="repair successful"

EOS Console [root://localhost] |/eos/gsdc/proc/conversion/f5/> file check fxid:0088e14b
path="/eos/gsdc/grid/03/25446/ad59e1c1-db70-11ec-b821-3cecef03e998" fxid="0088e14b" size="1967736" nrep="15" checksumtype="adler" checksum="ace9596000000000000000000000000000000000000000000000000000000000"
nrep="00" fsid="517" host="jbod-mgmt-04.sdfarm.kr:1095" fstpath="/jbod/box_07_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="01" fsid="13" host="jbod-mgmt-01.sdfarm.kr:1095" fstpath="/jbod/box_01_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="02" fsid="601" host="jbod-mgmt-04.sdfarm.kr:1096" fstpath="/jbod/box_08_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="03" fsid="349" host="jbod-mgmt-03.sdfarm.kr:1095" fstpath="/jbod/box_05_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="04" fsid="853" host="jbod-mgmt-06.sdfarm.kr:1095" fstpath="/jbod/box_11_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="05" fsid="1189" host="jbod-mgmt-08.sdfarm.kr:1095" fstpath="/jbod/box_15_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="06" fsid="1021" host="jbod-mgmt-07.sdfarm.kr:1095" fstpath="/jbod/box_13_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="07" fsid="769" host="jbod-mgmt-05.sdfarm.kr:1096" fstpath="/jbod/box_10_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="08" fsid="97" host="jbod-mgmt-01.sdfarm.kr:1096" fstpath="/jbod/box_02_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="09" fsid="1273" host="jbod-mgmt-08.sdfarm.kr:1096" fstpath="/jbod/box_16_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="10" fsid="685" host="jbod-mgmt-05.sdfarm.kr:1095" fstpath="/jbod/box_09_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="11" fsid="22012" host="jbod-mgmt-11.sdfarm.kr:1096" fstpath="/jbod/box_22_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="12" fsid="265" host="jbod-mgmt-02.sdfarm.kr:1096" fstpath="/jbod/box_04_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="13" fsid="433" host="jbod-mgmt-03.sdfarm.kr:1096" fstpath="/jbod/box_06_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
nrep="14" fsid="937" host="jbod-mgmt-06.sdfarm.kr:1096" fstpath="/jbod/box_12_disk_012/00000381/0088e14b" size="1967736" statsize="1052672" error_label="none"
'''

Hi Geonmo,

Could you tell me exactly what version you are running in your cluster?
eos version and do all the FSTs run the same version?

In general, it’s always good to specify the type of error that the FSCK mechanism needs to fix otherwise it might have a hard time deciding which repair procedure to apply.

For this particular case can you also print the following information:
eos fileinfo fxid:0088e14b
Double check how many file systems are available in the group that the stripes of this file belong to. If there are enough for a repair to happen then issue the following command:
eos fsck repair --fixd 0088e14b --fsid 433 --error rep_diff_n

If you still get an error, or a successful message without any modifications of the file, please attach the log of the MGM during that period and I will check it out.

Thanks,
Elvin