Hello, Elvin.
During the ALICE test, an ‘ADD’ related error occurred, while “GET” was successful. After tracking the ‘ADD’ related content, we found the following log in FST. (At MGM, only INFO logs are displayed for the same logid.)
As you suggested, if timeout is the biggest factor, the first error message is estimated to be the most important factor. Among the individual time factors, ‘layout::opened’ seems to take the most time.
In our RAIN system with many disk arrays (21 FSTs in QRAIN), we assumed that layout-related operations would naturally take a long time, but we did not expect the error message to appear in just 1 second. If you could provide any suggestions on how to optimise this, we would greatly appreciate it.
Regards,
– Geonmo
250825 12:52:51 time=1756126371.773699 func=open level=ERROR logid=67cfd6a8-81b2-11f0-b4dc-c81f66db3ddd unit=fst@jbod-mgmt-02.sdfarm.kr:1096 tid=00007f6cf4b3a640 source=XrdFstOfsFile:813 tident=monalisa.512032:193@pcalimonitor.cern.ch sec=(null) uid=10367 gid=1395 name=nobody geo=“” xt=“” ob=“” msg=“slow open operation” open-duration=1022.122ms fxid=06202535 path=“/02/12000/0b014523-81b2-11f0-bd8e-00620ba19d1e” path::print=5.659ms creation::barrier=20.021ms layout::exists=0.029ms clone::fst=0.079ms layout::open=0.018ms layout::opened=995.678ms get::localfmd=0.007ms resync::localfmd=0.368ms layout::stat=0.014ms full::mutex=0.003ms layout::fallocate=0.007ms layout::fallocated=0.133ms fileio::object=0.080ms open::accounting=0.013ms end=0.013ms open=1022.122ms
250825 12:56:25 time=1756126585.024610 func=Close level=ERROR logid=67cfd6a8-81b2-11f0-b4dc-c81f66db3ddd unit=fst@jbod-mgmt-02.sdfarm.kr:1096 tid=00007f6de2753640 source=RainMetaLayout:1841 tident=monalisa.512032:193@pcalimonitor.cern.ch sec=unix uid=0 gid=0 name=monalisa geo=“” xt=“” ob=“” msg=“failed remote file close” stripe_id=1
….
source=RainMetaLayout:1860 tident=monalisa.512032:193@pcalimonitor.cern.ch sec=unix uid=0 gid=0 name=monalisa geo=“” xt=“” ob=“” msg=“error setting stripe checksum”
250825 12:58:31 time=1756126711.690312 func=Close level=ERROR logid=67cfd6a8-81b2-11f0-b4dc-c81f66db3ddd unit=fst@jbod-mgmt-02.sdfarm.kr:1096 tid=00007f6de2753640 source=RainMetaLayout:1867 tident=monalisa.512032:193@pcalimonitor.cern.ch sec=unix uid=0 gid=0 name=monalisa geo=“” xt=“” ob=“” msg=“failed to close local file”
250825 12:58:31 time=1756126711.690436 func=_close_wr level=ERROR logid=67cfd6a8-81b2-11f0-b4dc-c81f66db3ddd unit=fst@jbod-mgmt-02.sdfarm.kr:1096 tid=00007f6de2753640 source=XrdFstOfsFile:1927 tident=monalisa.512032:193@pcalimonitor.cern.ch sec= uid=10367 gid=1395 name=nobody geo=“” xt=“” ob=“” msg=“delete on close” fxid=06202535 ns_path=“/02/12000/0b014523-81b2-11f0-bd8e-00620ba19d1e”
250825 12:58:31 time=1756126711.690611 func=Remove level=ERROR logid=67cfd6a8-81b2-11f0-b4dc-c81f66db3ddd unit=fst@jbod-mgmt-02.sdfarm.kr:1096 tid=00007f6de2753640 source=RainMetaLayout:1556 tident=monalisa.512032:193@pcalimonitor.cern.ch sec=unix uid=0 gid=0 name=monalisa geo=“” xt=“” ob=“” msg=“failed to remove remote stripe” stripe_id=1
…
250825 12:58:31 time=1756126711.697138 func=_close_wr level=CRIT logid=67cfd6a8-81b2-11f0-b4dc-c81f66db3ddd unit=fst@jbod-mgmt-02.sdfarm.kr:1096 tid=00007f6de2753640 source=XrdFstOfsFile:2031 tident=monalisa.512032:193@pcalimonitor.cern.ch sec= uid=10367 gid=1395 name=nobody geo=“” xt=“” ob=“” info=“deleting on close” fn=/02/12000/0b014523-81b2-11f0-bd8e-00620ba19d1e fstpath=/jbod/box_04_disk_052/00002824/06202535 reason=“failed layout close”