Xsmap files still created for replica 2 files, when balancing

I thought to open a discussion about the xsmap files. I understood that they have been deprecated for replica files, and indeed we had a period working with aquamarine in which they completely disappeared. However, they started to come back at some point, and after our citrine upgrade, they are still created. They don’t really disturb, except that they are not updated when the files are modified, and in these situation balancing or draining operation fail because they anyway try to check the new content of the file against the xsmap file corresponding to the old content.

Does anyone else observer that they come back when balancing files ? What would be the reason ? Could it be linked to a misconfiguration ? Is there a way to get rid of them, for replica files only, not for rain files ?

Hi Franck,

What is the exact version of eos that you are currently running?
Can you tell me if for a newly written 2-replica file you still get the block-xs files?
Now, can you pick an existing file for which you know it has a blockxs file and paste the output of the following commands?

eos file info <file_path> --fullpath
# Go to the first fst and list
ls -lrt <first_fst_physical_path>*
eos file convert --rewrite <file_path>
eos file info <file_path> --fullpath
# Go to the new first fst and list
ls -lrt <fist_fst_physical_path>*

Thanks!

Hi Elvin,

Thank you for your answer. We are running eos 4.2.12 on all servers (MGM and FSTs)

Newly created files are clean from any xsmap file.

I can pick any random file that appears in the MGM log as balancing, for instance the following one was just balanced :

root# eos file info fxid:8c9d020 --fullpath
  File: '/eos/jeodpp/data/SRS/Copernicus/S2/scenes/source/.../MSK_NODATA_B11.gml'  Flags: 0644  Clock: 624a360c8
  Size: 1823
Modify: Fri Dec 30 16:36:19 2016 Timestamp: 1483112179.0
Change: Fri Dec 30 16:36:19 2016 Timestamp: 1483112179.0
  CUid: 47000 CGid: 40500  Fxid: 08c9d020 Fid: 147443744    Pid: 25519936   Pxid: 01856740
XStype: adler    XS: 79 8b e4 78     ETAG: 39579128654987264:798be478
replica Stripes: 2 Blocksize: 4k LayoutId: 00600112
  #Rep: 2
┌───┬──────┬─────────────────────────────┬────────────────┬────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┬─────────────────────────┐
│no.│ fs-id│                         host│      schedgroup│            path│      boot│  configstatus│ drainstatus│  active│                  geotag│        physical location│
└───┴──────┴─────────────────────────────┴────────────────┴────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┴─────────────────────────┘
 0       87 ....136p.....                       default.14          /data15     booted             rw      nodrain   online                 JRC::DC1 /data15/00003998/08c9d020 
 1      407 ....223p.....                       default.14          /data15     booted             rw      nodrain   online                 JRC::DC1 /data15/00003998/08c9d020 

(undeleted) $ 135
*******

[root@....136p ~]# ls -l  /data15/00003998/08c9d020*
-rw-r--r--. 1 daemon daemon 1823 Jul 26  2017 /data15/00003998/08c9d020

[root@....223p ~]# ls -l  /data15/00003998/08c9d020*
-rw-r----- 1 daemon daemon  1823 Mar 20 18:25 /data15/00003998/08c9d020
-rw-r--r-- 1 daemon daemon 65540 Mar 20 18:25 /data15/00003998/08c9d020.xsmap

After conversion (strange… a first conversion failed with a message [tpc]: [FATAL] Socket error: Connection reset by peer 0 in the xrdlog.mgm) xsmap are not there any more:

# eos file info  /eos/jeodpp/data/SRS/Copernicus/S2/scenes/source/.../MSK_NODATA_B11.gml --fullpath
  File: '/eos/jeodpp/data/SRS/Copernicus/S2/scenes/source/.../MSK_NODATA_B11.gml'  Flags: 0644
  Size: 1823
Modify: Fri Dec 30 16:36:19 2016 Timestamp: 1483112179.0
Change: Fri Dec 30 16:36:19 2016 Timestamp: 1483112179.0
  CUid: 47000 CGid: 40500  Fxid: 0f21be96 Fid: 253869718    Pid: 25519936   Pxid: 01856740
XStype: adler    XS: 79 8b e4 78     ETAG: 68147633515921408:798be478
replica Stripes: 2 Blocksize: 4k LayoutId: 00100112
  #Rep: 2
┌───┬──────┬─────────────────────────────┬────────────────┬────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┬─────────────────────────┐
│no.│ fs-id│                         host│      schedgroup│            path│      boot│  configstatus│ drainstatus│  active│                  geotag│        physical location│
└───┴──────┴─────────────────────────────┴────────────────┴────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┴─────────────────────────┘
 0       18 ...133p...                          default.17          /data18     booted             rw      nodrain   online                 JRC::DC1 /data18/0000632a/0f21be96 
 1      351 ...220p...                          default.17          /data18     booted             rw      nodrain   online                 JRC::DC1 /data18/0000632a/0f21be96 
   
*******
[root@...133p ~]# ls -l /data18/0000632a/0f21be96*
-rw------- 1 daemon daemon 1823 Mar 20 18:35 /data18/0000632a/0f21be96
[root@...220p ~]# ls -l /data18/0000632a/0f21be96*
-rw------- 1 daemon daemon 1823 Mar 20 18:35 /data18/0000632a/0f21be96

Oh, this seems to have cause the crash of the FST that had the xsmap file :

pure virtual method called
terminate called without an active exception
error: received signal 6:
/lib64/libXrdEosFst.so(_ZN3eos3fst9XrdFstOfs20xrdfstofs_stacktraceEi+0x49)[0x7fec411e55a9]
/lib64/libc.so.6(+0x35270)[0x7fec45258270]
/lib64/libc.so.6(gsignal+0x37)[0x7fec452581f7]
/lib64/libc.so.6(abort+0x148)[0x7fec452598e8]
/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165)[0x7fec45b5eac5]
/lib64/libstdc++.so.6(+0x5ea36)[0x7fec45b5ca36]
/lib64/libstdc++.so.6(+0x5ea63)[0x7fec45b5ca63]
/lib64/libstdc++.so.6(+0x5f5cf)[0x7fec45b5d5cf]
/lib64/libXrdServer.so.2(_ZN17XrdXrootdProtocol7fsErrorEicR13XrdOucErrInfoPKcPc+0x2e8)[0x7fec46714c08]
/lib64/libXrdUtils.so.2(_ZN7XrdLink4DoItEv+0x19)[0x7fec46496149]

I saw this “pure virtual function call” mention in the changelog of version 4.2.18, so probably the same thing ?

The FST server went on for 30 seconds, produced a stacktrace, then stopped and restarted by itself, but all its FS went fully booting.

The crash related to “pure virtual method called” is a side effect of a TPC transfer (i.e. the conversion) crashing. This has been recently fixed and can affect any type of TPC transfer:


It’s available in 4.2.18. Sorry for the crash! The booting of the FSes is normal since maybe the leveldb database was not properly shutdown.
I will have another look at the balancing in 4.2.12 but could you tell me if the replica which was balanced, in this particular case the “undeleted” one on FS 135 also has a blockxs file or not. Maybe pick another balancing job and look at the dropped replica …

Thanks!

Hi Franck,

No need to do anything else, I’ve figured out where the problem comes from. I will soon push a fix for this.
Thanks again for your time in helping me reproduce this.

Cheers,
Elvin

Hi Elvin,

Thank you for diagnosing this; I wasn’t sure if it was a problem or not, this is why I preferred asking before filing an issue. So could it be that you also have that problem on your instances, or is it linked to some specificity of ours ?

OK about the crash, we will avoid to use tpc until we upgrade. Is it only necessary to upgrade FST for this, or also MGM ?

About the full booting of FST, we observed that the LevelDB is more prone to not be properly shutdown than was sqlite, could it be ?

Hi Franck,

Yes, the problem is generic to any EOS instance. I’ve committed a fix for this and this behaviour should go away in 4.2.19. The corresponding commit:
https://gitlab.cern.ch/dss/eos/commit/d562fd5c40395ba43762ebfdaf8c64edbaf3f051

The fix for the TPC requires an update of the FSTs.

Whenever the FST crashes with a segmentation fault and the proper shut-down procedure is not followed then this will result in a resync at start-up. It’s just that now we’re more careful than before. It doesn’t mean that the leveldb is actually corrupted but it’s just that we want to be on the safe side.

Cheers,
Elvin

Good, thank you ! This fix requires only an update of the MGM, correct ?
But with this, we will still have remaining xsmap files previously created ? Would they be ignored even if present (that would be ideal) ? Otherwise, any way to remove them ? Since we have to keep the ones for rain6 files, a mere find of the filsystems doesn’t seem on option. Maybe could they be removed on demand during some verify or resync command ? Unless this has no sense…

OK for the TPC part, and for the full boot. Indeed, the full resync seems now almost automatic if the FST wasn’t shutdown correctly. Previously, we could avoid full resync most of the time, even for our cases of sudden reboot of the system. But I get you, better be on the safe side, this sounds right.

Yes, this requires an update of the MGM. The old xsmap file are ignored. They are still used and created for RAIN files.

OK, thank you. So after upgrade we would not need to bother about the xsmap files, and errors while draining or balancing, nor scandisk reports of block checksum corruption will be seen, right ?

I realized that on a separate test instance, running mixed 4.2.12 and 4.2.17, balancing of files don’t create the xsmap files. How could that be ? Is there some specific condition triggering them ? Some historical parameters of the files ?

What makes a difference is that setting the default “replica” layout in recent versions doesn’t set anymore the sys.blockchecksum attribute. That’s why when you test it in a new instance or a newly created directory where you do "attr set default=replica " you won’t see the blockxs files begin created even when drained or balanced.

In your (initial) case, probably those directories where created (long time ago in the beryl_aquamarine version) when the command also enforced the sys.blockchecksum extended attribute.

Ah OK, so we could mitigate the problem immediately by changing this attribute on the most used folders to avoid creation of blockchecksum from new files ?

Yes, to avoid creating blockxs files for drained/balanced files from those dirs.