Missing and Unable tor Restore/Repair Replica in RAIN6 (Unable to schedule stripes for reconstruction; could not place new replica; replica inconsistency repair failed and No space left on device)

prasun · May 14, 2021, 8:39am

Dear EOS Expert and Administrator,

Three days ago we restart Kolkata:EOS2 after migrated to Quarkdb from Memory-NS.

At first we found that there were permission error in xrdlog.mgm on Master and Slave. We was able read file from EOS , but unable to write. Before migration, we had able read and write also.

Today, we had enable “eos fsck” and then check the xrdlog.mgm log; and found that there are many replication error i.e. “could not place new replica” and “[3009] Unable to schedule stripes for reconstruction”.

======================
[root@eos-mgm ~]# tail -25 /var/log/eos/mgm/xrdlog.mgm
210514 13:15:05 time=1620978305.070644 func=Repair level=INFO logid=static… unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ef7f6700 source=FsckEntry:794 tident= sec=(null) uid=99 gid=99 name=- geo="" msg=“fsck repair” fxid=02cf6484 err_type=6
210514 13:15:05 time=1620978305.070660 func=RepairReplicaInconsistencies level=INFO logid=4bfd2332-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ef7f6700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf6484 fsid=91
210514 13:15:05 time=1620978305.070667 func=open level=INFO logid=4c20ca80-b488-11eb-9eab-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc915bfc700 source=XrdMgmOfsFile:2490 tident=daemon.5023:568@eos-mgm sec=sss uid=2 gid=2 name=daemon geo="" msg=“nominal stripes:7 reconstructed stripes=2 group_idx=12”
210514 13:15:05 time=1620978305.070677 func=open level=INFO logid=4c20ca80-b488-11eb-9eab-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc915bfc700 source=XrdMgmOfsFile:2499 tident=daemon.5023:568@eos-mgm sec=sss uid=2 gid=2 name=daemon geo="" msg=“plain booking size is 5246976
210514 13:15:05 time=1620978305.070675 func=RepairReplicaInconsistencies level=INFO logid=4bfd2332-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ef7f6700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo=”" fxid=02cf6484 fsid=88
210514 13:15:05 time=1620978305.070687 func=RepairReplicaInconsistencies level=INFO logid=4bfd2332-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ef7f6700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf6484 fsid=86
210514 13:15:05 time=1620978305.070696 func=RepairReplicaInconsistencies level=INFO logid=4bfd2332-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ef7f6700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf6484 fsid=89
210514 13:15:05 time=1620978305.070705 func=RepairReplicaInconsistencies level=INFO logid=4bfd2332-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ef7f6700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf6484 fsid=90
210514 13:15:05 time=1620978305.070713 func=RepairReplicaInconsistencies level=INFO logid=4bfd2332-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ef7f6700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf6484 fsid=85
210514 13:15:05 time=1620978305.070718 func=Emsg level=ERROR logid=4c20ca80-b488-11eb-9eab-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc915bfc700 source=XrdMgmOfsFile:3231 tident=daemon.5023:568@eos-mgm sec=sss uid=2 gid=2 name=daemon geo="" Unable to schedule stripes for reconstruction /eos/alicekolkata/grid/15/40831/7d17b8b4-5a75-11eb-9b25-c36162373fca; No space left on device
210514 13:15:05 time=1620978305.070794 func=IdMap level=INFO logid=static… unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc915bfc700 source=Mapping:993 tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=sss sec.name=“daemon” sec.host=“eos-mgm.tier2-kol.res.in” sec.vorg="" sec.grps=“daemon” sec.role="" sec.info="" sec.app="" sec.tident=“daemon.5023:568@eos-mgm” vid.uid=2 vid.gid=2
210514 13:15:05 time=1620978305.070825 func=open level=INFO logid=4c20e2e0-b488-11eb-9eab-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc915bfc700 source=XrdMgmOfsFile:500 tident=daemon.5023:568@eos-mgm sec=sss uid=2 gid=2 name=daemon geo="" op=read path=/eos/alicekolkata/grid/11/14854/93066332-5a75-11eb-af56-37d7d0319c85 info=cap.msg=<…>&cap.sym=<…>&eos.encodepath=curl&eos.pio.action=reconstruct&eos.pio.recfs=91&mgm.logid=4c1fc2f2-b488-11eb-a3ff-e4434b664554&tpc.stage=placement
210514 13:15:05 time=1620978305.070831 func=DoIt level=ERROR logid=4c1fb3ca-b488-11eb-8caf-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ecff1700 source=DrainTransferJob:154 tident= sec= uid=0 gid=0 name= geo="" src=root://eoskolkata.tier2-kol.res.in:1094//#curl#/eos/alicekolkata/grid/15/40831/7d17b8b4-5a75-11eb-9b25-c36162373fca dst=root://eos06.tier2-kol.res.in:1095//replicate:0 logid=4c1fbffa-b488-11eb-8caf-e4434b664554 tpc_err=[ERROR] Server responded with an error: [3009] Unable to schedule stripes for reconstruction /eos/alicekolkata/grid/15/40831/7d17b8b4-5a75-11eb-9b25-c36162373fca; No space left on device
210514 13:15:05 time=1620978305.070982 func=DoIt level=INFO logid=4c20d1e2-b488-11eb-b3d1-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5f17fa700 source=DrainTransferJob:145 tident= sec= uid=0 gid=0 name= geo="" [tpc]: app=fsck logid=4c20dbce-b488-11eb-b3d1-e4434b664554 src_url=root://eoskolkata.tier2-kol.res.in:1094//#curl#/eos/alicekolkata/grid/00/19018/f2fc4c7a-5a75-11eb-b976-87f2ec6129f1 => dst_url=root://eos10.tier2-kol.res.in:1095//replicate:0 prepare_msg=[SUCCESS]
210514 13:15:05 time=1620978305.070983 func=open level=INFO logid=4c20e2e0-b488-11eb-9eab-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc915bfc700 source=XrdMgmOfsFile:1037 tident=daemon.5023:568@eos-mgm sec=sss uid=2 gid=2 name=daemon geo="" acl=0 r=0 w=0 wo=0 egroup=0 shared=0 mutable=1 facl=0
210514 13:15:05 time=1620978305.071061 func=SelectDstFs level=ERROR logid=4c1fb3ca-b488-11eb-8caf-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ecff1700 source=DrainTransferJob:546 tident= sec= uid=0 gid=0 name= geo="" msg=“fxid=02cf6164 could not place new replica”
210514 13:15:05 time=1620978305.071082 func=ReportError level=ERROR logid=4c1fb3ca-b488-11eb-8caf-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ecff1700 source=DrainTransferJob:45 tident= sec= uid=0 gid=0 name= geo="" msg=“failed to select destination file system” fxid=02cf6164
210514 13:15:05 time=1620978305.071100 func=RepairReplicaInconsistencies level=ERROR logid=4bfd193c-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5ecff1700 source=FsckEntry:666 tident= sec= uid=0 gid=0 name= geo="" msg=
“replica inconsistency repair failed fxid=02cf6164 src_fsid=87
210514 13:15:05 time=1620978305.071177 func=Repair level=INFO logid=static… unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5eb7ee700 source=FsckEntry:794 tident= sec=(null) uid=99 gid=99 name=- geo=”" msg=“fsck repair” fxid=02cf64c0 err_type=6
210514 13:15:05 time=1620978305.071204 func=RepairReplicaInconsistencies level=INFO logid=4bfd23c8-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5eb7ee700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf64c0 fsid=91
210514 13:15:05 time=1620978305.071221 func=RepairReplicaInconsistencies level=INFO logid=4bfd23c8-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5eb7ee700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf64c0 fsid=90
210514 13:15:05 time=1620978305.071232 func=RepairReplicaInconsistencies level=INFO logid=4bfd23c8-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5eb7ee700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf64c0 fsid=87
210514 13:15:05 time=1620978305.071243 func=RepairReplicaInconsistencies level=INFO logid=4bfd23c8-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5eb7ee700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf64c0 fsid=88
210514 13:15:05 time=1620978305.071253 func=open level=INFO logid=4c20e2e0-b488-11eb-9eab-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc915bfc700 source=XrdMgmOfsFile:2490 tident=daemon.5023:568@eos-mgm sec=sss uid=2 gid=2 name=daemon geo="" msg=“nominal stripes:7 reconstructed stripes=2 group_idx=12”
210514 13:15:05 time=1620978305.071257 func=RepairReplicaInconsistencies level=INFO logid=4bfd23c8-b488-11eb-8294-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5eb7ee700 source=FsckEntry:507 tident= sec= uid=0 gid=0 name= geo="" fxid=02cf64c0 fsid=89
[root@eos-mgm ~]#

We had check the group ls and seem that 4 group balancing were going on. Also check “Unable to schedule stripes for reconstruction” on xrdlog.mgm log for deep investigation.

[root@eos-mgm ~]# eos group ls | sort -k6h

┌──────────┬────────────────┬────────────┬──────┬────────────┬────────────┬────────────┬──────────┬──────────┐
└──────────┴────────────────┴────────────┴──────┴────────────┴────────────┴────────────┴──────────┴──────────┘
│type │ name│ status│ N(fs)│ dev(filled)│ avg(filled)│ sig(filled)│ balancing│ bal-shd│
groupview default.5 on 7 11.60 14.65 4.95 balancing 11
groupview default.12 on 7 1.19 34.70 0.53 balancing 8
groupview default.3 on 7 1.17 34.72 0.51 balancing 14
groupview default.14 on 7 0.27 35.48 0.16 idle 0
groupview default.7 on 7 13.98 35.75 5.73 balancing 7
groupview default.13 on 7 0.85 35.97 0.40 idle 0
groupview default.11 on 7 0.84 36.07 0.39 idle 0
groupview default.15 on 7 0.97 36.15 0.44 idle 0
groupview default.10 on 7 0.80 36.18 0.36 idle 0
groupview default.1 on 7 0.79 36.23 0.37 idle 0
groupview default.4 on 7 0.59 36.24 0.28 idle 0
groupview default.8 on 7 0.78 36.26 0.36 idle 0
groupview default.2 on 7 0.62 36.33 0.31 idle 0
groupview default.9 on 7 0.92 36.34 0.42 idle 0
groupview default.6 on 7 0.75 36.36 0.34 idle 0
groupview default.0 on 7 0.75 36.67 0.35 idle 0
[root@eos-mgm ~]#

[root@eos-mgm ~]# tail -5 /var/log/eos/mgm/xrdlog.mgm | grep “Unable to schedule stripes for reconstruction”
210514 12:39:42 time=1620976182.125798 func=DoIt level=ERROR logid=5abf4170-b483-11eb-b3d1-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007fc5f17fa700 source=DrainTransferJob:154 tident= sec= uid=0 gid=0 name= geo="" src=root://eoskolkata.tier2-kol.res.in:1094//#curl#/eos/alicekolkata/grid/08/25677/29079d00-579e-11e5-96ba-732cceb30bbe dst=root://eos05.tier2-kol.res.in:1095//replicate:0 logid=5abf5610-b483-11eb-b3d1-e4434b664554 tpc_err=[ERROR] Server responded with an error: [3009] Unable to schedule stripes for reconstruction /eos/alicekolkata/grid/08/25677/29079d00-579e-11e5-96ba-732cceb30bbe; No space left on device
[root@eos-mgm ~]# eos file info /eos/alicekolkata/grid/08/25677/29079d00-579e-11e5-96ba-732cceb30bbe
File: ‘/eos/alicekolkata/grid/08/25677/29079d00-579e-11e5-96ba-732cceb30bbe’ Flags: 0600
Size: 101415327
Modify: Fri Mar 29 20:36:52 2019 Timestamp: 1553872012.447777000
Change: Fri Mar 29 20:36:47 2019 Timestamp: 1553872007.671813569
Birth: Thu Jan 1 05:30:00 1970 Timestamp: 0.000000000
CUid: 10367 CGid: 1395 Fxid: 0028751f Fid: 2651423 Pid: 22691 Pxid: 000058a3
XStype: adler XS: 69 91 75 39 ETAGs: “711735942053888:69917539”
Layout: raid6 Stripes: 7 Blocksize: 1M LayoutId: 20640642 Redundancy: d2::t0
#Rep: 6
┌───┬──────┬────────────────────────┬────────────────┬────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┐
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│
└───┴──────┴────────────────────────┴────────────────┴────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┘
0 91 eos09.tier2-kol.res.in default.12 /xdata6 booted rw nodrain online Kolkata::EOS2
1 86 eos08.tier2-kol.res.in default.12 /xdata6 booted rw nodrain online Kolkata::EOS2
2 89 eos06.tier2-kol.res.in default.12 /xdata6 booted rw nodrain online Kolkata::EOS2
3 88 eos07.tier2-kol.res.in default.12 /xdata6 booted rw nodrain online Kolkata::EOS2
4 90 eos10.tier2-kol.res.in default.12 /xdata6 booted rw nodrain online Kolkata::EOS2
5 85 eos04.tier2-kol.res.in default.12 /xdata6 booted rw nodrain online Kolkata::EOS2

[root@eos-mgm ~]

On above output of eos file info, one of replica i.e. eos05 had miising. There are too many files in Kolkata::EOS2 where one or two replicas are missing for group no 5,12,3 and 7 where balancing were going on.

[root@eos-mgm ~]# eos file info /eos/alicekolkata/grid/12/34178/92cf7dd2-3f30-11eb-997d-4bf1c2cd6a34
File: ‘/eos/alicekolkata/grid/12/34178/92cf7dd2-3f30-11eb-997d-4bf1c2cd6a34’ Flags: 0664
Size: 11998913
Modify: Wed Dec 16 05:22:22 2020 Timestamp: 1608076342.468447000
Change: Wed Dec 16 05:22:22 2020 Timestamp: 1608076342.179614875
Birth: Wed Dec 16 05:22:22 2020 Timestamp: 1608076342.179614875
CUid: 10367 CGid: 1395 Fxid: 02904406 Fid: 43009030 Pid: 7597 Pxid: 00001dad
XStype: adler XS: 47 a3 39 79 ETAGs: “11545148580167680:47a33979”
Layout: raid6 Stripes: 7 Blocksize: 1M LayoutId: 20640642 Redundancy: d2::t0
#Rep: 6
┌───┬──────┬────────────────────────┬────────────────┬────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┐
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│
└───┴──────┴────────────────────────┴────────────────┴────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┘
0 55 eos10.tier2-kol.res.in default.7 /xdata15 booted rw nodrain online Kolkata::EOS2
1 50 eos04.tier2-kol.res.in default.7 /xdata15 booted rw nodrain online Kolkata::EOS2
2 53 eos07.tier2-kol.res.in default.7 /xdata15 booted rw nodrain online Kolkata::EOS2
3 56 eos09.tier2-kol.res.in default.7 /xdata15 booted rw nodrain online Kolkata::EOS2
4 52 eos05.tier2-kol.res.in default.7 /xdata15 booted rw nodrain online Kolkata::EOS2
5 54 eos06.tier2-kol.res.in default.7 /xdata15 booted rw nodrain online Kolkata::EOS2

[root@eos-mgm ~]#
[root@eos-mgm ~]# eos file info /eos/alicekolkata/grid/12/23882/13ddb16a-2927-11e7-8d56-8fb51abc3cf0
File: ‘/eos/alicekolkata/grid/12/23882/13ddb16a-2927-11e7-8d56-8fb51abc3cf0’ Flags: 0644
Size: 276749879
Modify: Tue Mar 26 11:05:58 2019 Timestamp: 1553578558.297332000
Change: Tue Mar 26 11:05:19 2019 Timestamp: 1553578519.076744161
Birth: Thu Jan 1 05:30:00 1970 Timestamp: 0.000000000
CUid: 10367 CGid: 1395 Fxid: 00033d22 Fid: 212258 Pid: 3326 Pxid: 00000cfe
XStype: adler XS: 57 d2 af ce ETAGs: “56977573019648:57d2afce”
Layout: raid6 Stripes: 7 Blocksize: 1M LayoutId: 20640642 Redundancy: d1::t0
#Rep: 5
┌───┬──────┬────────────────────────┬────────────────┬────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┐
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│
└───┴──────┴────────────────────────┴────────────────┴────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┘
0 36 eos04.tier2-kol.res.in default.5 /xdata13 booted rw nodrain online Kolkata::EOS2
1 38 eos05.tier2-kol.res.in default.5 /xdata13 booted rw nodrain online Kolkata::EOS2
2 41 eos10.tier2-kol.res.in default.5 /xdata13 booted rw nodrain online Kolkata::EOS2
3 37 eos08.tier2-kol.res.in default.5 /xdata13 booted rw nodrain online Kolkata::EOS2
4 39 eos07.tier2-kol.res.in default.5 /xdata13 booted rw nodrain online Kolkata::EOS2

[root@eos-mgm ~]#

On above output of eos file info, two replica i.e. eos06 and eos09 are missing. But when we check those mount point i.e. xdata13 in “eos fs ls”, it’s had exist.

[root@eos-mgm ~]# eos -b fs ls | grep xdata13 (It’s exist also in FST)
eos04.tier2-kol.res.in 1095 36 /xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
eos08.tier2-kol.res.in 1095 37 /xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
eos05.tier2-kol.res.in 1095 38 /xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
eos07.tier2-kol.res.in 1095 39 /xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
eos06.tier2-kol.res.in 1095 40 /xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
eos10.tier2-kol.res.in 1095 41 /xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
eos09.tier2-kol.res.in 1095 42 /xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
[root@eos-mgm ~]# ssh eos06 df -kh |grep xdata13
/dev/sdo1 9.0T 279G 8.7T 4% /xdata13
[root@eos-mgm ~]# ssh eos05 df -kh |grep xdata13
/dev/sdo1 9.0T 1.7T 7.3T 19% /xdata13
[root@eos-mgm ~]# ssh eos04 df -kh |grep xdata13
/dev/sdo1 9.0T 1.4T 7.6T 16% /xdata13
[root@eos-mgm ~]#

However the the space occupied on xdata13 by different FST are not same.
the fsck report are below for information:-
[root@eos-mgm ~]# eos fsck report
timestamp=1620979518 tag=“blockxs_err” count=576
timestamp=1620979518 tag=“d_mem_sz_diff” count=2113
timestamp=1620979518 tag=“orphans_n” count=432097
timestamp=1620979518 tag=“rep_diff_n” count=1692234
timestamp=1620979518 tag=“rep_missing_n” count=3529158
timestamp=1620979518 tag=“unreg_n” count=160310
[root@eos-mgm ~]#

So, kindly suggest what to do . How to repair those missing and faulty replicas with safeguard?

Regards
Prasun

esindril · May 17, 2021, 9:17am

Hi Prasun,

Make sure that the EOS instance is update to 4.8.46 before enabling fsck repair for the RAIN layout files. There were some important fixes added in the latest release concerning fsck and RAIN files.

Having said that, it looks that you have groups of 7 file systems and you use RAIN with 7 stripes so there is no extra file systems that can be used to recover any broken or missing stripes. You should have at least 8 file systems per group or use RAIN layout with 6 stripes and not 7. This is the reason the schedule is not able to place new replicas since your groups are too small.

What you can do as an experiment is to (copy and) convert one such file to RAIN with 6 stripes, delete one of the stripes and run eos fsck repair targeted at this file and see if the repair works for you. But remember you need 4.8.46 everywhere for this to work fine.

Cheers,
Elvin

prasun · May 28, 2021, 8:58pm

Dear Elvin,

Sorry for late reply.

As per your suggestion, to use RAIN (raid6) layout of 7 strips with 8 FSTs, we add another fst (i.e. eos11.tier2-kol.res.in ) which has same nos of fs and same size of fs. We had untouched the value of sys.forced.nstripes=“7” inside the eos attr. Also change the value of groupsize from 7 to 8 and the value of groupmod is same i.e. 24. EOS package are updated to EOS 4.8.46 (2020) in all eos systems.

Then the errors i.e. [Missing and Unable tor Restore/Repair Replica in RAIN6 (Unable to schedule stripes for reconstruction; could not place new replica; replica inconsistency repair failed and No space left on device) are not coming like before.

After modification of above changes, we restart the eos. But, we notice that “Balancing” are started on all 16 groups. Earlier its running on 4 groups. In converter.log and xrdlog.mgm, there are multiple error like "[ERROR] Server responded with an error: [3010] Unable to open file " , “Operation not permitted”, ERROR ConversionJob, etc.

When investigate xrdlog,mgm and Converter.log thoroughly, we find that the copy of file from /eos/alicekolkata/grid/* to /eos/alicekolkata/proc/conversion/* by tpc were successfully. But during conversion , open of those files at /eos/alicekolkata/proc/conversion/ are failed and show unable to open and Operation not permitted. Examples of such files are given below:-

===================

[root@eos-mgm ~]# zcat /var/log/eos/mgm/xrdlog.mgm-20210528.gz | grep cb7c3183b75f
210527 17:09:08 time=1622115548.469267 func=DoIt level=INFO logid=static… unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007f7817fed700 source=ConversionJob:235 tident= sec=(null) uid=99 gid=99 name=- geo="" [tpc]: root@eoskolkata.tier2-kol.res.in:1094@root://eoskolkata.tier2-kol.res.in:1094//eos/alicekolkata/grid/08/29291/77bc70a0-1d1f-11eb-aabb-cb7c3183b75f => root@eoskolkata.tier2-kol.res.in:1094@root://eoskolkata.tier2-kol.res.in:1094//eos/alicekolkata/proc/conversion/00000000024f69c4:default.5#20640642 prepare_msg=[SUCCESS]
210527 17:09:08 time=1622115548.470326 func=open level=INFO logid=26044cf2-bee0-11eb-980a-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007f78b7df6700 source=XrdMgmOfsFile:499 tident=root.21413:436@eos-mgm sec=sss uid=0 gid=0 name=daemon geo="" op=read path=/eos/alicekolkata/grid/08/29291/77bc70a0-1d1f-11eb-aabb-cb7c3183b75f info=eos.app=eos/converter&eos.rgid=0&eos.ruid=0&tpc.stage=placement
210527 17:09:08 time=1622115548.472068 func=open level=INFO logid=26044cf2-bee0-11eb-980a-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007f78b7df6700 source=XrdMgmOfsFile:2938 tident=root.21413:436@eos-mgm sec=sss uid=0 gid=0 name=daemon geo="" op=read path=/eos/alicekolkata/grid/08/29291/77bc70a0-1d1f-11eb-aabb-cb7c3183b75f info=eos.app=eos/converter&eos.rgid=0&eos.ruid=0&tpc.stage=placement target[0]=(eos04.tier2-kol.res.in,57) target[1]=(eos09.tier2-kol.res.in,63) target[2]=(eos05.tier2-kol.res.in,59) target[3]=(eos10.tier2-kol.res.in,62) target[4]=(eos06.tier2-kol.res.in,61) target[5]=(eos07.tier2-kol.res.in,60) target[6]=(eos08.tier2-kol.res.in,58) redirection=eos06.tier2-kol.res.in?&cap.sym=<…>&cap.msg=<…>&mgm.logid=26044cf2-bee0-11eb-980a-e4434b664554&mgm.replicaindex=4&mgm.replicahead=4&mgm.id=024f69c4&mgm.mtime=1604330663 xrd_port=1095 http_port=8001
210527 17:09:08 time=1622115548.606344 func=open level=INFO logid=26191100-bee0-11eb-980a-e4434b664554 unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007f78b7df6700 source=XrdMgmOfsFile:497 tident=root.21413:436@eos-mgm sec=sss uid=99 gid=99 name=daemon geo="" op=write trunc=512 path=/eos/alicekolkata/proc/conversion/00000000024f69c4:default.5#20640642 info=eos.app=eos/converter&eos.checksum=d02d0ca2&eos.excludefsid=57,63,59,62,61,60,58&eos.group=5&eos.layout.blockchecksum=crc32c&eos.layout.blocksize=1M&eos.layout.checksum=adler&eos.layout.nstripes=7&eos.layout.type=raid6&eos.rgid=2&eos.ruid=2&eos.space=default&eos.targetsize=604041&oss.asize=604041&tpc.dlg=root@eoskolkata.tier2-kol.res.in:1094&tpc.dlgon=0&tpc.key=2416177a000153a560af84dc&tpc.lfn=/eos/alicekolkata/grid/08/29291/77bc70a0-1d1f-11eb-aabb-cb7c3183b75f&tpc.spr=root&tpc.src=root@eos06.tier2-kol.res.in:1095&tpc.stage=copy&tpc.str=1&tpc.tpr=root
210527 17:09:08 time=1622115548.607728 func=HandleError level=ERROR logid=static… unit=mgm@eos-mgm.tier2-kol.res.in:1094 tid=00007f7817fed700 source=ConversionJob:378 tident= sec=(null) uid=99 gid=99 name=- geo="" msg="[ERROR] Server responded with an error: [3010] Unable to open file /eos/alicekolkata/proc/conversion/00000000024f69c4:default.5#20640642; Operation not permitted" tpc_src=root://eoskolkata.tier2-kol.res.in:1094//eos/alicekolkata/grid/08/29291/77bc70a0-1d1f-11eb-aabb-cb7c3183b75f tpc_dst=root://eoskolkata.tier2-kol.res.in:1094//eos/alicekolkata/proc/conversion/00000000024f69c4:default.5#20640642 conversion_id=00000000024f69c4:default.5#20640642
[root@eos-mgm ~]#

[root@eos-mgm ~]# zcat /var/log/eos/mgm/Converter.log-20210528.gz |grep cb7c3183b75f
210527 17:09:08 INFO ConversionJob:235 [tpc]: root@eoskolkata.tier2-kol.res.in:1094@root://eoskolkata.tier2-kol.res.in:1094//eos/alicekolkata/grid/08/29291/77bc70a0-1d1f-11eb-aabb-cb7c3183b75f => root@eoskolkata.tier2-kol.res.in:1094@root://eoskolkata.tier2-kol.res.in:1094//eos/alicekolkata/proc/conversion/00000000024f69c4:default.5#20640642 prepare_msg=[SUCCESS]
210527 17:09:08 ERROR ConversionJob:378 msg="[ERROR] Server responded with an error: [3010] Unable to open file /eos/alicekolkata/proc/conversion/00000000024f69c4:default.5#20640642; Operation not permitted" tpc_src=root://eoskolkata.tier2-kol.res.in:1094//eos/alicekolkata/grid/08/29291/77bc70a0-1d1f-11eb-aabb-cb7c3183b75f tpc_dst=root://eoskolkata.tier2-kol.res.in:1094//eos/alicekolkata/proc/conversion/00000000024f69c4:default.5#20640642 conversion_id=00000000024f69c4:default.5#20640642
[root@eos-mgm ~]#

===============================
Above output clearly showed that files insides folder /eos/alicekolkata/proc/conversion are unable to open and conversion error.

We also read the suggestion in threads i.e. TPC setup with token authentication - #21 by gbiro and File base manual conversion issue - #9 by ebirngru. Link File base manual conversion issue - #14 by esindril suggested that to remove space policy. Accordingly, we had remove space policy parameter i.e. eos space config default space.policy.[layout, nstripes, checksum, blocksize and blockchecksum]=remove.

Also, the permission of /eos/alicekolkata/proc and /eos/alicekolkata/proc/conversion are
drwxr-xr-x 1 root root 20480 Jul 30 2019 proc
drwxrwx— 1 daemon daemon 0 May 28 19:27 conversion.

But, till conversion is failed with permission error.

So, suggest accordingly.

Regards
Prasun.

esindril · June 1, 2021, 9:49am

Hi Prasun,

It is normal that after adding the new node you have the balancing trying to bring all the file system to the same level with respect to how much they are filled. In this particular case, I don’t think you are interested in this balancing activity given the number of file systems that you have per group so you could simply disable the balancing inside groups i.e. eos space config default space.balancer=off.

Concerning the Operation not permitted errors indeed the first step is to remove the space.policy configuration. One other thing is to make sure the daemon user is in the list of sudoers when doing eos vid ls. You can achieve this by doing:
eos vid set membership 2 +sudo

Let me know if this helps.

Cheers,
Elvin

prasun · June 6, 2021, 2:50pm

Dear Elvin,

Thank for your suggestion.

I had applied " eos vid set membership 2 +sudo".
Now the output of eos vid ls for sudoer is
[root@eos-mgm ~]# eos vid ls|grep sudoer
sudoer => uids(daemon)
[root@eos-mgm ~]#

Also, disabled the balancer and remove the space policy i.e.
##############
eos space config default space.policy=remove
eos space config default space.policy.conversion=remove
eos space config default space.policy.layout=remove
eos space config default space.policy.blockchecksum=remove
eos space config default space.policy.blocksize=remove
eos space config default space.policy.checksum=remove
eos space config default space.policy.nstripes=remove
###############

We had monitoring the xrdlog.mgm in master and slave. Till now, Errors i.e Operation not permitted are not occurred yet.

I try to copy local file to EOS storage by xrdcp and eoscp; but both failed. However when I try to copy the file inside /eos/alicekolkata/grid/ to /tmp by eoscp, it ok and xrdcp is failed.

[root@eos-mgm ~]# eos ls -la /eos/alicekolkata/grid

-rw-r–r-- 3 root root 5242880000 May 14 2019 5g-test
-rw-r–r-- 3 root root 655 Jun 3 00:11 e-hosts
[root@eos-mgm ~]#
[root@eos-mgm ~]# /opt/eos/xrootd/bin/xrdcp -d1 /root/anaconda-ks.cfg root://eos-mgm.tier2-kol.res.in//eos/alicekolkata/grid/anaconda1.txt
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3006] Unable to create file (O_EXCL) /eos/alicekolkata/grid/anaconda1.txt; File exists (destination)
[root@eos-mgm ~]#
A file anaconda1.txt is created with 0 kb size.
[root@eos-mgm ~]# eoscp /root/anaconda-ks.cfg root://eos-mgm.tier2-kol.res.in//eos/alicekolkata/grid/anaconda3.txt
error: target file open failed - errno=22 : Invalid argument
[root@eos-mgm ~]#

I try to check the error during debugging of xrdcp (local to EOS), where we notice following errors -
… Stream inactive since 15 seconds, stream timeout: 60, allocated SIDs: 0, wait barrier: … " , Will rerun task “TickGeneratorTask …”, …Removing socket from the poller, …Recovering error for stream #0: [ERROR] Socket error., …Reporting disconnection to queued message handlers. … [3006] Unable to create file (O_EXCL) …

Those types of error are shown for different FST at different times.

Same errors has happen when copy a file from eos to locally by xrdcp. But eoscp is ok.

[root@eos-mgm ~]# /opt/eos/xrootd/bin/xrdcp -f -d1 root://eos-mgm.tier2-kol.res.in//eos/alicekolkata/grid/5g-test /dev/null
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3014] Unable to open file /eos/alicekolkata/grid/5g-test; Network is unreachable (source)

[root@eos-mgm ~]# eoscp /eos/alicekolkata/grid/5g-test /dev/null
[eoscp] Total 5000.00 MB |====================| 100.00 % [4380.0 MB/s]
[eoscp] #################################################################
[eoscp] # Date : ( 1622990706 ) Sun Jun 6 20:15:06 2021
[eoscp] # auth forced= krb5= gsi=
[eoscp] # Source Name [00] : /eos/alicekolkata/grid/5g-test
[eoscp] # Destination Name [00] : /dev/null
[eoscp] # Data Copied [bytes] : 5242880000
[eoscp] # Realtime [s] : 1.197000
[eoscp] # Eff.Copy. Rate[MB/s] : 4380.016500
[eoscp] # Write Start Position : 0
[eoscp] # Write Stop Position : 5242880000
[root@eos-mgm ~]#

So, help to resolve this issue. Due these issue, till ADD test is shown fail.

(Due to limitation of characters, unable to send full debug log of xrdcp)

Regards
Prasun

esindril · June 7, 2021, 1:12pm

Hi Prasun,

For this error:

[root@eos-mgm ~]# /opt/eos/xrootd/bin/xrdcp -d1 /root/anaconda-ks.cfg root://eos-mgm.tier2-kol.res.in//eos/alicekolkata/grid/anaconda1.txt
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3006] Unable to create file (O_EXCL) /eos/alicekolkata/grid/anaconda1.txt; File exists (destination)

it looks to me that the file already exists, so you could use the xrdcp -f flag to force an overwrite of the file.

For this error:

[root@eos-mgm ~]# /opt/eos/xrootd/bin/xrdcp -f -d1 root://eos-mgm.tier2-kol.res.in//eos/alicekolkata/grid/5g-test /dev/null
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3014] Unable to open file /eos/alicekolkata/grid/5g-test; Network is unreachable (source)

it looks that some of the file systems are not available. Can you post the following info?

eos fileinfo /eos/alicekolkata/grid/5g-test
eos node ls
eos fs ls

Thanks,
Elvin

prasun · June 8, 2021, 1:06pm

Dear Elvin,

For the second error (i.e. copy a file from /eos/alicekolkata/grid/5g-test to local disk), the output of “eos fileinfo /eos/alicekolkata/grid/5g-test”, “eos node ls” and “eos fs ls” are below:-

==========

[root@eos-mgm ~]# eos fileinfo /eos/alicekolkata/grid/5g-test
File: ‘/eos/alicekolkata/grid/5g-test’ Flags: 0644
Size: 5242880000
Modify: Tue May 14 18:59:14 2019 Timestamp: 1557840554.552836000
Change: Tue May 14 18:57:46 2019 Timestamp: 1557840466.431126691
Birth: Thu Jan 1 05:30:00 1970 Timestamp: 0.000000000
CUid: 0 CGid: 0 Fxid: 007a8e0c Fid: 8031756 Pid: 14 Pxid: 0000000e
XStype: adler XS: bb 8a 74 88 ETAGs: “2156008084340736:bb8a7488”
Layout: raid6 Stripes: 7 Blocksize: 1M LayoutId: 20640642 Redundancy: d3::t0
#Rep: 7
┌───┬──────┬────────────────────────┬────────────────┬────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┐
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│
└───┴──────┴────────────────────────┴────────────────┴────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┘
0 92 eos04.tier2-kol.res.in default.13 /xdata7 booted rw nodrain online Kolkata::EOS2
1 97 eos10.tier2-kol.res.in default.13 /xdata7 booted rw nodrain online Kolkata::EOS2
2 93 eos08.tier2-kol.res.in default.13 /xdata7 booted rw nodrain online Kolkata::EOS2
3 98 eos09.tier2-kol.res.in default.13 /xdata7 booted rw nodrain online Kolkata::EOS2
4 95 eos07.tier2-kol.res.in default.13 /xdata7 booted rw nodrain online Kolkata::EOS2
5 96 eos06.tier2-kol.res.in default.13 /xdata7 booted rw nodrain online Kolkata::EOS2
6 94 eos05.tier2-kol.res.in default.13 /xdata7 booted rw nodrain online Kolkata::EOS2

[root@eos-mgm ~]# eos node ls
┌──────────┬────────────────────────────────┬────────────────┬──────────┬────────────┬──────┬──────────┬────────┬────────┬────────────────┬─────┐
│type │ hostport│ geotag│ status│ activated│ txgw│ gw-queued│ gw-ntx│ gw-rate│ heartbeatdelta│ nofs│
└──────────┴────────────────────────────────┴────────────────┴──────────┴────────────┴──────┴──────────┴────────┴────────┴────────────────┴─────┘
nodesview eos04.tier2-kol.res.in:1095 Kolkata::EOS2 online on off 0 10 120 2 16
nodesview eos05.tier2-kol.res.in:1095 Kolkata::EOS2 online on off 0 10 120 1 16
nodesview eos06.tier2-kol.res.in:1095 Kolkata::EOS2 online on off 0 10 120 2 16
nodesview eos07.tier2-kol.res.in:1095 Kolkata::EOS2 online on off 0 10 120 2 16
nodesview eos08.tier2-kol.res.in:1095 Kolkata::EOS2 online on off 0 10 120 2 16
nodesview eos09.tier2-kol.res.in:1095 Kolkata::EOS2 online on off 0 10 120 2 16
nodesview eos10.tier2-kol.res.in:1095 Kolkata::EOS2 online on off 0 10 120 2 16
nodesview eos11.tier2-kol.res.in:1095 Kolkata::EOS2 online on off 0 10 120 1 16

[root@eos-mgm ~]# eos fs ls
┌───────── │host └───────── eos04.tier2-kol.res.in 1095 1 eos05.tier2-kol.res.in 1095 2 eos08.tier2-kol.res.in 1095 3 eos07.tier2-kol.res.in 1095 4 eos06.tier2-kol.res.in 1095 5 eos10.tier2-kol.res.in 1095 6 eos09.tier2-kol.res.in 1095 7 eos04.tier2-kol.res.in 1095 8 eos08.tier2-kol.res.in 1095 9 eos05.tier2-kol.res.in 1095 10 eos07.tier2-kol.res.in 1095 11 eos06.tier2-kol.res.in 1095 12 eos10.tier2-kol.res.in 1095 13 eos09.tier2-kol.res.in 1095 14 eos04.tier2-kol.res.in 1095 15 eos08.tier2-kol.res.in 1095 16 eos05.tier2-kol.res.in 1095 17 eos07.tier2-kol.res.in 1095 18 eos06.tier2-kol.res.in 1095 19 eos10.tier2-kol.res.in 1095 20 eos09.tier2-kol.res.in 1095 21 eos04.tier2-kol.res.in 1095 22 eos08.tier2-kol.res.in 1095 23 eos05.tier2-kol.res.in 1095 24 eos07.tier2-kol.res.in 1095 25 eos06.tier2-kol.res.in 1095 26 eos10.tier2-kol.res.in 1095 27 eos09.tier2-kol.res.in 1095 28 eos04.tier2-kol.res.in 1095 29 eos08.tier2-kol.res.in 1095 30 eos05.tier2-kol.res.in 1095 31 eos07.tier2-kol.res.in 1095 32 eos06.tier2-kol.res.in 1095 33 eos10.tier2-kol.res.in 1095 34 eos09.tier2-kol.res.in 1095 35 eos04.tier2-kol.res.in 1095 36 eos08.tier2-kol.res.in 1095 37 eos05.tier2-kol.res.in 1095 38 eos07.tier2-kol.res.in 1095 39 eos06.tier2-kol.res.in 1095 40 eos10.tier2-kol.res.in 1095 41 eos09.tier2-kol.res.in 1095 42 eos04.tier2-kol.res.in 1095 43 eos08.tier2-kol.res.in 1095 44 eos05.tier2-kol.res.in 1095 45 eos07.tier2-kol.res.in 1095 46 eos06.tier2-kol.res.in 1095 47 eos10.tier2-kol.res.in 1095 48 eos09.tier2-kol.res.in 1095 49 eos04.tier2-kol.res.in 1095 50 eos08.tier2-kol.res.in 1095 51 eos05.tier2-kol.res.in 1095 52 eos07.tier2-kol.res.in 1095 53 eos06.tier2-kol.res.in 1095 54 eos10.tier2-kol.res.in 1095 55 eos09.tier2-kol.res.in 1095 56 eos04.tier2-kol.res.in 1095 57 eos08.tier2-kol.res.in 1095 58 eos05.tier2-kol.res.in 1095 59 eos07.tier2-kol.res.in 1095 60 eos06.tier2-kol.res.in 1095 61 eos10.tier2-kol.res.in 1095 62 eos09.tier2-kol.res.in 1095 63 eos04.tier2-kol.res.in 1095 64 eos08.tier2-kol.res.in 1095 65 eos05.tier2-kol.res.in 1095 66 eos07.tier2-kol.res.in 1095 67 eos06.tier2-kol.res.in 1095 68 eos10.tier2-kol.res.in 1095 69 eos09.tier2-kol.res.in 1095 70 eos04.tier2-kol.res.in 1095 71 eos08.tier2-kol.res.in 1095 72 eos05.tier2-kol.res.in 1095 73 eos07.tier2-kol.res.in 1095 74 eos06.tier2-kol.res.in 1095 75 eos10.tier2-kol.res.in 1095 76 eos09.tier2-kol.res.in 1095 77 eos04.tier2-kol.res.in 1095 78 eos08.tier2-kol.res.in 1095 79 eos05.tier2-kol.res.in 1095 80 eos07.tier2-kol.res.in 1095 81 eos06.tier2-kol.res.in 1095 82 eos10.tier2-kol.res.in 1095 83 eos09.tier2-kol.res.in 1095 84 eos04.tier2-kol.res.in 1095 85 eos08.tier2-kol.res.in 1095 86 eos05.tier2-kol.res.in 1095 87 eos07.tier2-kol.res.in 1095 88 eos06.tier2-kol.res.in 1095 89 eos10.tier2-kol.res.in 1095 90 eos09.tier2-kol.res.in 1095 91 eos04.tier2-kol.res.in 1095 92 eos08.tier2-kol.res.in 1095 93 eos05.tier2-kol.res.in 1095 94 eos07.tier2-kol.res.in 1095 95 eos06.tier2-kol.res.in 1095 96 eos10.tier2-kol.res.in 1095 97 eos09.tier2-kol.res.in 1095 98 eos04.tier2-kol.res.in 1095 99 eos08.tier2-kol.res.in 1095 100 eos05.tier2-kol.res.in 1095 101 eos07.tier2-kol.res.in 1095 102 eos06.tier2-kol.res.in 1095 103 eos10.tier2-kol.res.in 1095 104 eos09.tier2-kol.res.in 1095 105 eos04.tier2-kol.res.in 1095 106 eos08.tier2-kol.res.in 1095 107 eos05.tier2-kol.res.in 1095 108 eos07.tier2-kol.res.in 1095 109 eos06.tier2-kol.res.in 1095 110 eos10.tier2-kol.res.in 1095 111 eos09.tier2-kol.res.in 1095 112 eos11.tier2-kol.res.in 1095 113 eos11.tier2-kol.res.in 1095 114 eos11.tier2-kol.res.in 1095 115 eos11.tier2-kol.res.in 1095 116 eos11.tier2-kol.res.in 1095 117 eos11.tier2-kol.res.in 1095 118 eos11.tier2-kol.res.in 1095 119 eos11.tier2-kol.res.in 1095 120 eos11.tier2-kol.res.in 1095 121 eos11.tier2-kol.res.in 1095 122 eos11.tier2-kol.res.in 1095 123 eos11.tier2-kol.res.in 1095 124 eos11.tier2-kol.res.in 1095 125 eos11.tier2-kol.res.in 1095 126 eos11.tier2-kol.res.in 1095 127 eos11.tier2-kol.res.in 1095 128 ───────────────┬────┬──────┬────────────────────────────────┬────────────────┬────────────────┬────────────┬──────────────┬────────────┬────────┬────────────────┐
│port│ id│ path│ schedgroup│ geotag│ boot│ configstatus│ drain│ active│ health│
───────────────┴────┴──────┴────────────────────────────────┴────────────────┴────────────────┴────────────┴──────────────┴────────────┴────────┴────────────────┘
/xdata0 default.0 Kolkata::EOS2 booted rw nodrain online N/A
/xdata0 default.0 Kolkata::EOS2 booted rw nodrain online N/A
/xdata0 default.0 Kolkata::EOS2 booted rw nodrain online N/A
/xdata0 default.0 Kolkata::EOS2 booted rw nodrain online N/A
/xdata0 default.0 Kolkata::EOS2 booted rw nodrain online N/A
/xdata0 default.0 Kolkata::EOS2 booted rw nodrain online N/A
/xdata0 default.0 Kolkata::EOS2 booted rw nodrain online N/A
/xdata1 default.1 Kolkata::EOS2 booted rw nodrain online N/A
/xdata1 default.1 Kolkata::EOS2 booted rw nodrain online N/A
/xdata1 default.1 Kolkata::EOS2 booted rw nodrain online N/A
/xdata1 default.1 Kolkata::EOS2 booted rw nodrain online N/A
/xdata1 default.1 Kolkata::EOS2 booted rw nodrain online N/A
/xdata1 default.1 Kolkata::EOS2 booted rw nodrain online N/A
/xdata1 default.1 Kolkata::EOS2 booted rw nodrain online N/A
/xdata10 default.2 Kolkata::EOS2 booted rw nodrain online N/A
/xdata10 default.2 Kolkata::EOS2 booted rw nodrain online N/A
/xdata10 default.2 Kolkata::EOS2 booted rw nodrain online N/A
/xdata10 default.2 Kolkata::EOS2 booted rw nodrain online N/A
/xdata10 default.2 Kolkata::EOS2 booted rw nodrain online N/A
/xdata10 default.2 Kolkata::EOS2 booted rw nodrain online N/A
/xdata10 default.2 Kolkata::EOS2 booted rw nodrain online N/A
/xdata11 default.3 Kolkata::EOS2 booted rw nodrain online N/A
/xdata11 default.3 Kolkata::EOS2 booted rw nodrain online N/A
/xdata11 default.3 Kolkata::EOS2 booted rw nodrain online N/A
/xdata11 default.3 Kolkata::EOS2 booted rw nodrain online N/A
/xdata11 default.3 Kolkata::EOS2 booted rw nodrain online N/A
/xdata11 default.3 Kolkata::EOS2 booted rw nodrain online N/A
/xdata11 default.3 Kolkata::EOS2 booted rw nodrain online N/A
/xdata12 default.4 Kolkata::EOS2 booted rw nodrain online N/A
/xdata12 default.4 Kolkata::EOS2 booted rw nodrain online N/A
/xdata12 default.4 Kolkata::EOS2 booted rw nodrain online N/A
/xdata12 default.4 Kolkata::EOS2 booted rw nodrain online N/A
/xdata12 default.4 Kolkata::EOS2 booted rw nodrain online N/A
/xdata12 default.4 Kolkata::EOS2 booted rw nodrain online N/A
/xdata12 default.4 Kolkata::EOS2 booted rw nodrain online N/A
/xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
/xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
/xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
/xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
/xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
/xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
/xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
/xdata14 default.6 Kolkata::EOS2 booted rw nodrain online N/A
/xdata14 default.6 Kolkata::EOS2 booted rw nodrain online N/A
/xdata14 default.6 Kolkata::EOS2 booted rw nodrain online N/A
/xdata14 default.6 Kolkata::EOS2 booted rw nodrain online N/A
/xdata14 default.6 Kolkata::EOS2 booted rw nodrain online N/A
/xdata14 default.6 Kolkata::EOS2 booted rw nodrain online N/A
/xdata14 default.6 Kolkata::EOS2 booted rw nodrain online N/A
/xdata15 default.7 Kolkata::EOS2 booted rw nodrain online N/A
/xdata15 default.7 Kolkata::EOS2 booted rw nodrain online N/A
/xdata15 default.7 Kolkata::EOS2 booted rw nodrain online N/A
/xdata15 default.7 Kolkata::EOS2 booted rw nodrain online N/A
/xdata15 default.7 Kolkata::EOS2 booted rw nodrain online N/A
/xdata15 default.7 Kolkata::EOS2 booted rw nodrain online N/A
/xdata15 default.7 Kolkata::EOS2 booted rw nodrain online N/A
/xdata2 default.8 Kolkata::EOS2 booted rw nodrain online N/A
/xdata2 default.8 Kolkata::EOS2 booted rw nodrain online N/A
/xdata2 default.8 Kolkata::EOS2 booted rw nodrain online N/A
/xdata2 default.8 Kolkata::EOS2 booted rw nodrain online N/A
/xdata2 default.8 Kolkata::EOS2 booted rw nodrain online N/A
/xdata2 default.8 Kolkata::EOS2 booted rw nodrain online N/A
/xdata2 default.8 Kolkata::EOS2 booted rw nodrain online N/A
/xdata3 default.9 Kolkata::EOS2 booted rw nodrain online N/A
/xdata3 default.9 Kolkata::EOS2 booted rw nodrain online N/A
/xdata3 default.9 Kolkata::EOS2 booted rw nodrain online N/A
/xdata3 default.9 Kolkata::EOS2 booted rw nodrain online N/A
/xdata3 default.9 Kolkata::EOS2 booted rw nodrain online N/A
/xdata3 default.9 Kolkata::EOS2 booted rw nodrain online N/A
/xdata3 default.9 Kolkata::EOS2 booted rw nodrain online N/A
/xdata4 default.10 Kolkata::EOS2 booted rw nodrain online N/A
/xdata4 default.10 Kolkata::EOS2 booted rw nodrain online N/A
/xdata4 default.10 Kolkata::EOS2 booted rw nodrain online N/A
/xdata4 default.10 Kolkata::EOS2 booted rw nodrain online N/A
/xdata4 default.10 Kolkata::EOS2 booted rw nodrain online N/A
/xdata4 default.10 Kolkata::EOS2 booted rw nodrain online N/A
/xdata4 default.10 Kolkata::EOS2 booted rw nodrain online N/A
/xdata5 default.11 Kolkata::EOS2 booted rw nodrain online N/A
/xdata5 default.11 Kolkata::EOS2 booted rw nodrain online N/A
/xdata5 default.11 Kolkata::EOS2 booted rw nodrain online N/A
/xdata5 default.11 Kolkata::EOS2 booted rw nodrain online N/A
/xdata5 default.11 Kolkata::EOS2 booted rw nodrain online N/A
/xdata5 default.11 Kolkata::EOS2 booted rw nodrain online N/A
/xdata5 default.11 Kolkata::EOS2 booted rw nodrain online N/A
/xdata6 default.12 Kolkata::EOS2 booted rw nodrain online N/A
/xdata6 default.12 Kolkata::EOS2 booted rw nodrain online N/A
/xdata6 default.12 Kolkata::EOS2 booted rw nodrain online N/A
/xdata6 default.12 Kolkata::EOS2 booted rw nodrain online N/A
/xdata6 default.12 Kolkata::EOS2 booted rw nodrain online N/A
/xdata6 default.12 Kolkata::EOS2 booted rw nodrain online N/A
/xdata6 default.12 Kolkata::EOS2 booted rw nodrain online N/A
/xdata7 default.13 Kolkata::EOS2 booted rw nodrain online N/A
/xdata7 default.13 Kolkata::EOS2 booted rw nodrain online N/A
/xdata7 default.13 Kolkata::EOS2 booted rw nodrain online N/A
/xdata7 default.13 Kolkata::EOS2 booted rw nodrain online N/A
/xdata7 default.13 Kolkata::EOS2 booted rw nodrain online N/A
/xdata7 default.13 Kolkata::EOS2 booted rw nodrain online N/A
/xdata7 default.13 Kolkata::EOS2 booted rw nodrain online N/A
/xdata8 default.14 Kolkata::EOS2 booted rw nodrain online N/A
/xdata8 default.14 Kolkata::EOS2 booted rw nodrain online N/A
/xdata8 default.14 Kolkata::EOS2 booted rw nodrain online N/A
/xdata8 default.14 Kolkata::EOS2 booted rw nodrain online N/A
/xdata8 default.14 Kolkata::EOS2 booted rw nodrain online N/A
/xdata8 default.14 Kolkata::EOS2 booted rw nodrain online N/A
/xdata8 default.14 Kolkata::EOS2 booted rw nodrain online N/A
/xdata9 default.15 Kolkata::EOS2 booted rw nodrain online N/A
/xdata9 default.15 Kolkata::EOS2 booted rw nodrain online N/A
/xdata9 default.15 Kolkata::EOS2 booted rw nodrain online N/A
/xdata9 default.15 Kolkata::EOS2 booted rw nodrain online N/A
/xdata9 default.15 Kolkata::EOS2 booted rw nodrain online N/A
/xdata9 default.15 Kolkata::EOS2 booted rw nodrain online N/A
/xdata9 default.15 Kolkata::EOS2 booted rw nodrain online N/A
/xdata0 default.0 Kolkata::EOS2 booted rw nodrain online N/A
/xdata1 default.1 Kolkata::EOS2 booted rw nodrain online N/A
/xdata10 default.2 Kolkata::EOS2 booted rw nodrain online N/A
/xdata11 default.3 Kolkata::EOS2 booted rw nodrain online N/A
/xdata12 default.4 Kolkata::EOS2 booted rw nodrain online N/A
/xdata13 default.5 Kolkata::EOS2 booted rw nodrain online N/A
/xdata14 default.6 Kolkata::EOS2 booted rw nodrain online N/A
/xdata15 default.7 Kolkata::EOS2 booted rw nodrain online N/A
/xdata2 default.8 Kolkata::EOS2 booted rw nodrain online N/A
/xdata3 default.9 Kolkata::EOS2 booted rw nodrain online N/A
/xdata4 default.10 Kolkata::EOS2 booted rw nodrain online N/A
/xdata5 default.11 Kolkata::EOS2 booted rw nodrain online N/A
/xdata6 default.12 Kolkata::EOS2 booted rw nodrain online N/A
/xdata7 default.13 Kolkata::EOS2 booted rw nodrain online N/A
/xdata8 default.14 Kolkata::EOS2 booted rw nodrain online N/A
/xdata9 default.15 Kolkata::EOS2 booted rw nodrain online N/A

[root@eos-mgm ~]#

================

For first error, as you suggested to use xrdcp -f flag to force an overwrite of the file, I have again run xrdcp with -f and till error are same. Indeed the file anaconda1.txt was not exists in /eos/alicekolkata/grid.

So, now copy file as different filename, i.e. /root/anaconda-ks.cfg to /eos/alicekolkata/grid/anaconda2.txt (anaconda2.txt is not exists in eos). But it fails.
( Output of Error in next message)

So, suggest accordingly.

Regards
Prasun

prasun · June 8, 2021, 1:09pm

Dear Elvin,

(Continuation mail …)

Error messages during copy file as different filename, i.e. /root/anaconda-ks.cfg to /eos/alicekolkata/grid/anaconda2.txt (anaconda2.txt is not exists in eos)

++++++++++++++++++
+++++++++++
[root@eos-mgm ~]# /opt/eos/xrootd/bin/xrdcp -f -d1 /root/anaconda-ks.cfg root://eoskolkata.tier2-kol.res.in//eos/alicekolkata/grid/anaconda2.txt
[2021-06-08 17:45:13.689310 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:45:17.761497 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:45:23.774839 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:45:31.789882 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:45:41.808269 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:45:53.833583 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:46:07.861576 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:46:23.892577 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:46:41.929918 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:47:01.972608 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:47:01.972699 +0530][Error ][AsyncSock ] [eos06.tier2-kol.res.in:1095 #0.0] Wont retry kXR_endsess request because wouldreach connection timeout.
[2021-06-08 17:47:01.972731 +0530][Error ][AsyncSock ] [eos06.tier2-kol.res.in:1095 #0.0] Socket error while handshaking: [ERROR] Socket timeout
[2021-06-08 17:47:01.972800 +0530][Error ][PostMaster ] [eos06.tier2-kol.res.in:1095 #0] elapsed = 108, pConnectionWindow = 120 seconds.
[2021-06-08 17:47:01.972822 +0530][Info ][PostMaster ] [eos06.tier2-kol.res.in:1095 #0] Attempting reconnection in 12 seconds.
[2021-06-08 17:51:14.236440 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:51:18.385154 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:51:24.386828 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:51:32.388692 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:51:42.390216 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:51:54.392682 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:52:08.396132 +0530][Info ][XRootDTransport ] [eos06.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:53:24.466533 +0530][Info ][XRootDTransport ] [eos11.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:53:28.531194 +0530][Info ][XRootDTransport ] [eos11.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:53:34.541210 +0530][Info ][XRootDTransport ] [eos11.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:53:42.555186 +0530][Info ][XRootDTransport ] [eos11.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:53:52.575932 +0530][Info ][XRootDTransport ] [eos11.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:54:04.592939 +0530][Info ][XRootDTransport ] [eos11.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:59:18.957686 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:59:22.091866 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:59:28.095826 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:59:36.100867 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:59:46.106391 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 17:59:58.112764 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 18:00:12.134273 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 18:00:28.167967 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 18:00:46.180975 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 18:01:06.182693 +0530][Info ][XRootDTransport ] [eos07.tier2-kol.res.in:1095 #0.0] Got wait response to kXR_endsess: session still active
[2021-06-08 18:01:06.182778 +0530][Error ][AsyncSock ] [eos07.tier2-kol.res.in:1095 #0.0] Wont retry kXR_endsess request because wouldreach connection timeout.
[2021-06-08 18:01:06.182808 +0530][Error ][AsyncSock ] [eos07.tier2-kol.res.in:1095 #0.0] Socket error while handshaking: [ERROR] Socket timeout
[2021-06-08 18:01:06.182871 +0530][Error ][PostMaster ] [eos07.tier2-kol.res.in:1095 #0] elapsed = 108, pConnectionWindow = 120 seconds.
[2021-06-08 18:01:06.182892 +0530][Info ][PostMaster ] [eos07.tier2-kol.res.in:1095 #0] Attempting reconnection in 12 seconds.
[2021-06-08 18:03:18.315604 +0530][Warning][XRootD ] [eoskolkata.tier2-kol.res.in:1094] Redirect limit has been reached for message kXR_open (file: /eos/alicekolkata/grid/anaconda2.txt?cap.msg=AGKNpclYG8Wvtokkw4EyqOnTGfdEwRxLex1ByrOt561XsHiIQ7/26H1MS49DbEZtiG7XU7BmdUVQbvLg5BaslpauswVl7EeXuN0v0xRCaQwHIoFxZ4pgeiYrW5jWf4B0Ovn7sVYguZYTsSPu5ikhoJbPR1zG48aCv6H2haorg5xGLPeAScEmFn64LQl55fnRLoR/EpPDtok0E8Evc8oihqWf1RmQWf2W2ai7Vflv3LtGkzIQbQCAtOPzE5ufSC6L2Xq38YC6Ebfd839en428oSI1SYomuZUgT4GtEpNFLKXNAG1R7ZYm5VRr6v9e3KoEzv2rJYOW1L+rDwNSC7bDjqeJ0FaVIjLFj5O+b4lPR5sJU+TzTRiUWZQybOyvHcJa4HLbacN2fw36OWSFSoqngN+Btt5yQCF/gqrRfOThRMCteQYUhyfxHiVF/u8H8s1Cau56FiqgJlYlDOUhLx+gqDjmaM4WPfIE86MISrntny2ovgB7T5/4jbuxQwdqxbVFRK8RyGpZ53li/vVkLemqq2yDIappPTzVQUZd1MrgC95vz7nIka0JMZwTBEI+TKre1cPFj2+EOXwP+IIYmR5Vic0NqasNR6DXVxIb7PZkM+xHQGvskvUJu6lIXfnsgCa3cW5jOxK/8FYdIyPlzhu83i/0u+EwL+AMsNaJiGFFraooXlF/pAyp5LbgIudPOQHWBi6SNPdtAV5cRQ9nlhYam32wKwhANsQY1Y30ELR4stfsknNmpfQFX1ZjYZdX7sbUqEI++qHeqkCNbnPoWdKX+LtKICfJT2rxsQ/75EBGJs4BJuPwARuAqi7gidWOweofYQfASA7EJJ3pbSu6AjXPski6ae3oQKe8GhfsA30YgMqeWO6clePZhBrEjsnRq/93lEozkAq+9KaMMajt41Ngu8kud/5zeJriKRbp83GqUpgVADWNxNzJTajLLKgU7JIK97SQVIyo8j5JKrziqtu4Am0iSt05H+0KJMwlhTkMld8kcH2yOjiuxQ==&cap.sym=mgp7qoiL87thgS6w9i4UID/MVaE=&mgm.id=03b3892e&mgm.logid=90393514-c855-11eb-b7f5-e4434b664554&mgm.replicahead=3&mgm.replicaindex=3&oss.asize=2734&tried=eos09.tier2-kol.res.in,eos07.tier2-kol.res.in,eos05.tier2-kol.res.in,eos06.tier2-kol.res.in,eos06.tier2-kol.res.in,eos09.tier2-kol.res.in,eos11.tier2-kol.res.in,eos08.tier2-kol.res.in,eos06.tier2-kol.res.in,eos11.tier2-kol.res.in,eos09.tier2-kol.res.in,eos07.tier2-kol.res.in,eos10.tier2-kol.res.in,eos06.tier2-kol.res.in,eos07.tier2-kol.res.in,eos05.tier2-kol.res.in, mode: 0644, flags: kXR_delete kXR_open_updt kXR_async kXR_retstat ), the last known error is: [ERROR] Socket error
[0B/0B][100%][==================================================][0B/s]
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] Redirect trace-back:
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 0. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos09.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 1. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 2. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos07.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 3. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 4. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos05.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 5. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 6. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos06.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 7. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 8. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos06.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 9. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 10. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos09.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 11. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 12. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos11.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 13. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 14. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos08.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 15. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 16. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos06.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 17. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 18. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos11.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 19. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 20. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos09.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 21. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 22. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos07.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 23. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 24. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos10.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 25. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 26. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos06.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 27. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 28. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos07.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 29. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 30. Redirected from: root://eoskolkata.tier2-kol.res.in:1094/ to: root://eos05.tier2-kol.res.in:1095/
[2021-06-08 18:03:18.316233 +0530][Warning][XRootD ] 31. Retrying: root://eoskolkata.tier2-kol.res.in:1094/
Run: [FATAL] Redirect limit has been reached: (destination)
[root@eos-mgm ~]#
[root@eos-mgm ~]# eos ls -l /eos/alicekolkata/grid/anaconda2.txt
-rw-r–r-- 3 root root 0 Jun 8 18:13 anaconda2.txt (Size is 0)
[root@eos-mgm ~]#
++++++++++++++++++
++++++++++++++++++

Suggest accordingly.

Regards
Prasun

esindril · June 8, 2021, 1:39pm

Hi Prasun,

Are you sure your FSTs are reachable on port 1095? By the looks for this trace the client can not connect to any of the FST nodes. Do you have any errors, or as a matter of fact any activity, in the FST logs?

Thanks,
Elvin

prasun · June 9, 2021, 12:07pm

Dear Elvin,

I had check the port 1095 on every fst. Interesting it’s work on IPv6. In IPv4, it not listening. See below some output:- (On our EOS system, we only use IPv4.)

[root@eos-mgm ~]# for fst in eos{04…11}; do ssh $fst lsof -i -P -n | grep LISTEN | grep 1095; done
xrootd 10208 daemon 15u IPv6 202397 0t0 TCP :1095 (LISTEN)
xrootd 10274 daemon 15u IPv6 202397 0t0 TCP :1095 (LISTEN)
xrootd 10278 daemon 15u IPv6 275450 0t0 TCP :1095 (LISTEN)
xrootd 10345 daemon 15u IPv6 275450 0t0 TCP :1095 (LISTEN)
xrootd 10365 daemon 15u IPv6 276515 0t0 TCP :1095 (LISTEN)
xrootd 10431 daemon 15u IPv6 276515 0t0 TCP :1095 (LISTEN)
xrootd 9272 daemon 15u IPv6 105507 0t0 TCP :1095 (LISTEN)
xrootd 9339 daemon 15u IPv6 105507 0t0 TCP :1095 (LISTEN)
xrootd 4963 daemon 15u IPv6 72458 0t0 TCP :1095 (LISTEN)
xrootd 5031 daemon 15u IPv6 72458 0t0 TCP :1095 (LISTEN)
xrootd 10338 daemon 15u IPv6 243583 0t0 TCP :1095 (LISTEN)
xrootd 10405 daemon 15u IPv6 243583 0t0 TCP :1095 (LISTEN)
xrootd 4987 daemon 15u IPv6 96116 0t0 TCP :1095 (LISTEN)
xrootd 5053 daemon 15u IPv6 96116 0t0 TCP :1095 (LISTEN)
xrootd 11510 daemon 15u IPv6 26161 0t0 TCP :1095 (LISTEN)
xrootd 11575 daemon 15u IPv6 26161 0t0 TCP :1095 (LISTEN)
[root@eos-mgm ~]#
[root@eos-mgm ~]# for fst in eos{04…11}; do ssh $fst ss -tulwn |grep LISTEN |grep 1095; done
tcp LISTEN 0 128 [::]:1095 [::]:
tcp LISTEN 0 128 [::]:1095 [::]:
tcp LISTEN 0 128 [::]:1095 [::]:
tcp LISTEN 0 128 [::]:1095 [::]:
tcp LISTEN 0 128 [::]:1095 [::]:
tcp LISTEN 0 128 [::]:1095 [::]:
tcp LISTEN 0 128 [::]:1095 [::]:
tcp LISTEN 0 128 [::]:1095 [::]:
[root@eos-mgm ~]#
[root@eos-mgm ~]# for fst in eos{04…11}; do ssh $fst netstat -tnupl|grep 1095 ; done
tcp6 0 0 :::1095 ::: LISTEN 10208/xrootd
tcp6 0 0 :::1095 ::: LISTEN 10278/xrootd
tcp6 0 0 :::1095 ::: LISTEN 10365/xrootd
tcp6 0 0 :::1095 ::: LISTEN 9272/xrootd
tcp6 0 0 :::1095 ::: LISTEN 4963/xrootd
tcp6 0 0 :::1095 ::: LISTEN 10338/xrootd
tcp6 0 0 :::1095 ::: LISTEN 4987/xrootd
tcp6 0 0 :::1095 ::: LISTEN 11510/xrootd
[root@eos-mgm ~]#

It’s seem that 1095 is not woking over IPv4.
However the output of netcat(nc) are below -
[root@eos-mgm ~]# for fst in eos{04…11}; do nc -vz $fst.tier2-kol.res.in 1095; done
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 144.16.112.18:1095.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 144.16.112.19:1095.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 144.16.112.20:1095.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 144.16.112.21:1095.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 144.16.112.22:1095.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 144.16.112.23:1095.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 144.16.112.24:1095.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 144.16.112.25:1095.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
[root@eos-mgm ~]#

There are several error in xrdlog.fst on different FST’s:

eos05

Error while opening at eos10.tier2-kol.res.in:1095: [ERROR] Operation expired
210609 17:22:55 time=1623239575.791904 func=fileOpen level=ERROR logid=878b52bc-c918-11eb-8bc4-246e96b4de7c unit=fst@eos05.tier2-kol.res.in:1095 tid=00007f73879f9700 source=XrdIo:285 tident= sec= uid=0 gid=0 name= geo="" error= “open failed url=root://eos10.tier2-kol.res.in:1095///13/26967/ab42380e-f1e3-11ea-980f-2700edf4a5f9?authz=-----BEGIN SEALED CIPHER-----
…
…
…
210609 17:22:55 time=1623239575.791944 func=fileOpen level=WARN logid=878b52bc-c918-11eb-8bc4-246e96b4de7c unit=fst@eos05.tier2-kol.res.in:1095 tid=00007f73879f9700 source=XrdIo:289 tident= sec= uid=0 gid=0 name= geo=”" msg=“error encountered despite errno=0; setting errno=22”
210609 17:22:56 time=1623239576.813747 func=ProcessTpcOpaque level=ERROR logid=unknown unit=fst@eos05.tier2-kol.res.in:1095 tid=00007f7368ca7700 source=XrdFstOfsFile:2612 tident=daemon.15259:166@sampaeos.if.usp.br sec=unix uid=0 gid=0 name=daemon geo="" tpc key=12eaca10031d6e0860c0a4b9 not valid
210609 17:22:56 time=1623239576.813820 func=open level=ERROR logid=unknown unit=fst@eos05.tier2-kol.res.in:1095 tid=00007f7368ca7700 source=XrdFstOfsFile:136 tident=daemon.15259:166@sampaeos.if.usp.br sec=unix uid=0 gid=0 name=daemon geo="" msg=“failed while processing TPC/open opaque”

eos06:

210609 17:27:15 time=1623239835.164940 func=fileOpen level=ERROR logid=22c1dcd8-c919-11eb-94fc-246e96b4d8dc unit=fst@eos06.tier2-kol.res.in:1095 tid=00007f2e915cf700 source=XrdIo:285 tident= sec= uid=0 gid=0 name= geo="" error= “open failed url=root://eos08.tier2-kol.res.in:1095///14/52170/ab35a7ae-5fc2-11eb-9aaf-3734a168dc39?authz=-----BEGIN SEALED CIPHER-----
210609 17:27:15 time=1623239835.164983 func=fileOpen level=WARN logid=22c1dcd8-c919-11eb-94fc-246e96b4d8dc unit=fst@eos06.tier2-kol.res.in:1095 tid=00007f2e915cf700 source=XrdIo:289 tident= sec= uid=0 gid=0 name= geo=”" msg=“error encountered despite errno=0; setting errno=22”
210609 17:27:15 time=1623239835.165143 func=open level=WARN logid=a3ee6c8a-c915-11eb-bb0c-e4434b65e932 unit=fst@eos06.tier2-kol.res.in:1095 tid=00007f2e915cf700 source=XrdFstOfsFile:478 tident=alienmas.7321:233@pcalimonitor3.cern.ch sec=(null) uid=10367 gid=1395 name=nobody geo="" msg=“open error return recoverable error EIO(kXR_IOError)” fid=03127a92
210609 17:27:15 15419 FstOfs_open: alienmas.7321:233@pcalimonitor3.cern.ch Unable to open - failed open ; input/output error

eos07

ab4afa3e-70c3-11eb-9598-abc71dc25df9?authz=-----BEGIN SEALED CIPHER-----
&cap.msg=AGKNpclYG8U2ac0yBiXp+yIQ28VWsG5kXU9a9uquh1EKYx9JJUI+bOfAv0uBr1LU8Ok5CpEQprOR+kTkun8IGAAjdlJb5/Wsx5qUY/DBXYRK4JhF7aD/M95m6CaevXwpX+RfB+6vdiGrczzLbIxQ/DPzpAX8mzlBaPO6aufqQCNvjNmy25xlWxSTtVHrUDzjTkte4Zy8dId0EvvPrKO1/+tH8x9ZCeCCY7RzCPwfAGJv7WzJJMQEMpfv/Ix1G1Yth3fD2L2eJj7NOThXPcMPKQ0guIMyhwPo7f1mUNaaG9tedZue5Pdy73FGGktIsv2NF2dJvsFK5KX5Fk60svaXaujoX0OofJB5Iyoz81xgJrFCnTKpzFLVkNu0QW+IypwjShPYZunGrPQ8ha1sJWTKOeytp5S/IhFNHxMq+WqwtREmjHAVLXhOXqDlu7/DGYjBZF8slyDRuHo8ge4yEUeoFjF01rv3PLNqoIk8bHkbiAYnoo/GsYTJb7hYlxW/dqaCsmMcp8h0f9BkHQryOnyDNKfzuRp8WhxCWZ+a8ksef7YjQumbl/vO5y7wYOysWep9ByGYDbnFSguRC5mn/I26F5U13uyje9ILLDKYJyW5i7hpgLsuV4yYxopbcZSloHS2JYH5qf3skqoozwtIbImUoR+w7K2Bqq0ojsVRUPmfjG1kUPQhkxMY0fEN2AMLjoNPR5MI162HiokIH58bB+ThTlCsBOWoUlkbwIF3zT2O7TpHBwjmAb9A0d6Ri7UdMS6GrGRIjH7PFKEkUEmBas2Ac1pjZpMR2wlpYMkOxf99F9+Li8VtqDhZeKnRuWHRMwfOVhZErzRHsdT5Y7HZ6igQDc8wRqGLR+/iEHmebM41ZBua6W9OOAUh4GyZqCwuXKsWFrREy80ZCQZjR6k8Sq7+hLUv3VKrebG30JiRazuvVgiyJEizPUe9O80j40QW6zDgffBtWwKs/TFiKWdQ62/TpQTUu7uPGTmij1Nwv6LKMOMseu2DnSWVheYuFvQD1S8i&cap.sym=mgp7qoiL87thgS6w9i4UID/MVaE=&fst.blocksize=1048576&fst.readahead=true&fst.valid=1623239679&mgm.bookingsize=0&mgm.id=038b7787&mgm.lid=543426114&mgm.logid=d67cfc16-c915-11eb-93f1-e4434b664554&mgm.mtime=1613527146&mgm.path=/11/53210/ab4afa3e-70c3-11eb-9598-abc71dc25df9&mgm.replicahead=0&mgm.replicaindex=6, errno=0, errc=206, msg=[ERROR] Operation expired"
210609 17:28:40 22437 FstOfs_open: alienmas.1577484:154@aliendb06e.cern.ch Unable to open - failed open ; input/output error
210609 17:28:40 time=1623239920.311366 func=open level=WARN logid=d6461390-c915-11eb-8c9b-e4434b664554 unit=fst@eos07.tier2-kol.res.in:1095 tid=00007f70f41bd700 source=XrdFstOfsFile:478 tident=alienmas.1577484:154@aliendb06e.cern.ch sec=(null) uid=10367 gid=1395 name=nobody geo="" msg=“open error return recoverable error EIO(kXR_IOError)” fid=0335b0cf

All the fst’s are mostly similar error in xrdlog.fst.

Has eos version “EOS 4.8.46 (2020)” support dual stack network?

So, Suggest accordingly.
Prasun

esindril · June 9, 2021, 8:02pm

Hi Prasun,

The daemon is listening on both interfaces, the output from netstat is a bit misleading.
Can you paste the output of the following command ran on both the MGM machine and one of the FSTs?

rpm -qa | grep "eos\|xrootd"

Thanks,
Elvin

prasun · June 10, 2021, 11:43am

Dear Elvin,

The output of "rpm -qa | grep “eos|xrootd” are below:-

on eos-mgm:-

[root@eos-mgm ~]# rpm -qa | grep “eos|xrootd”
eos-xrootd-4.12.8-1.el7.cern.x86_64
xrootd-server-4.12.8-1.el7.x86_64
xrootd-libs-4.12.8-1.el7.x86_64
eos-fuse-4.8.46-1.el7.cern.x86_64
eos-fuse-sysv-4.8.46-1.el7.cern.x86_64
eos-apmon-1.1.9-1.el7.cern.x86_64
eos-test-4.8.46-1.el7.cern.x86_64
eos-protobuf3-3.5.1-5.el7.cern.eos.x86_64
xrootd-alicetokenacc-1.3.1-1.x86_64
eos-fuse-core-4.8.46-1.el7.cern.x86_64
xrootd-client-libs-4.12.8-1.el7.x86_64
eos-ns-inspect-4.8.46-1.el7.cern.x86_64
eos-folly-2019.11.11.00-1.el7.cern.x86_64
eos-nginx-1.12.2-5.x86_64
eos-server-4.8.46-1.el7.cern.x86_64
eos-client-4.8.46-1.el7.cern.x86_64
libmicrohttpd-0.9.38-eos.yves.el7.cern.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
xrootd-server-libs-4.12.8-1.el7.x86_64
eos-cleanup-4.8.46-1.el7.cern.x86_64
[root@eos-mgm ~]#

on eos-slave:-

[root@eos-slave ~]# rpm -qa | grep “eos|xrootd”
eos-client-4.8.46-1.el7.cern.x86_64
xrootd-libs-4.12.8-1.el7.x86_64
eos-folly-2019.11.11.00-1.el7.cern.x86_64
eos-apmon-1.1.9-1.el7.cern.x86_64
eos-xrootd-4.12.8-1.el7.cern.x86_64
eos-nginx-1.12.2-5.x86_64
eos-cleanup-4.8.46-1.el7.cern.x86_64
xrootd-server-libs-4.12.8-1.el7.x86_64
xrootd-server-4.12.8-1.el7.x86_64
xrootd-alicetokenacc-1.3.1-1.x86_64
eos-fuse-sysv-4.8.46-1.el7.cern.x86_64
eos-fuse-4.8.46-1.el7.cern.x86_64
eos-protobuf3-3.5.1-5.el7.cern.eos.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
eos-test-4.8.46-1.el7.cern.x86_64
eos-server-4.8.46-1.el7.cern.x86_64
xrootd-client-libs-4.12.8-1.el7.x86_64
libmicrohttpd-0.9.38-eos.yves.el7.cern.x86_64
eos-fuse-core-4.8.46-1.el7.cern.x86_64
eos-ns-inspect-4.8.46-1.el7.cern.x86_64
[root@eos-slave ~]#

on one of the fsts:-

[root@eos04 ~]# rpm -qa | grep “eos|xrootd”
eos-apmon-1.1.9-1.el7.cern.x86_64
eos-folly-2019.11.11.00-1.el7.cern.x86_64
eos-server-4.8.46-1.el7.cern.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
libmicrohttpd-0.9.38-eos.yves.el7.cern.x86_64
eos-xrootd-4.12.8-1.el7.cern.x86_64
eos-protobuf3-3.5.1-5.el7.cern.eos.x86_64
eos-client-4.8.46-1.el7.cern.x86_64
[root@eos04 ~]#

Regards
Prasun

esindril · June 10, 2021, 2:59pm

Hi Prasun,

Can you attach the logs of a write transfer to EOS with xrdcp with flag -d 3?

Thanks,
Elvin

prasun · June 11, 2021, 6:51am

Dear Elvin,

I’ve collected the logs and put it on cernbox. Log contents are unable to fit here due to limitation of characters.

[CERNBox]

Suggest.

Regards
Prasun

apeters · June 11, 2021, 7:56am

Hi Prasun,
Elvin is having his deserved vacations, I had a look at your log output.

Can you run this command from your client machine:

env XRD_LOGLEVEL=Dump xrdfs eos05.tier2-kol.res.in:1095 stat /

In the copy log, you essentially get a socket error for the open requests towards the FSTs on port 1095.

You are sure, there is no firewall on FSTs for ipv4/6?

prasun · June 11, 2021, 10:10am

Dear Andreas,

Wishes Elvin to happy vacation.

Firewalld and selinux are stopped in all FSTs and MGM nodes.

The output of “env XRD_LOGLEVEL=Dump xrdfs eos05.tier2-kol.res.in:1095 stat /” are below:

[root@eos-mgm ~]# env XRD_LOGLEVEL=Dump /opt/eos/xrootd/bin/xrdfs eos05.tier2-kol.res.in:1095 stat /
[2021-06-11 15:27:22.086141 +0530][Debug ][Utility ] Unable to process user config file: [ERROR] OS Error: No such file or directory
[2021-06-11 15:27:22.086797 +0530][Debug ][PlugInMgr ] Initializing plug-in manager…
[2021-06-11 15:27:22.086877 +0530][Debug ][PlugInMgr ] No default plug-in, loading plug-in configs…
[2021-06-11 15:27:22.086922 +0530][Debug ][PlugInMgr ] Processing plug-in definitions in /etc/xrootd/client.plugins.d…
[2021-06-11 15:27:22.087103 +0530][Debug ][PlugInMgr ] Processing plug-in definitions in /root/.xrootd/client.plugins.d…
[2021-06-11 15:27:22.087192 +0530][Debug ][PlugInMgr ] Unable to process directory /root/.xrootd/client.plugins.d: [ERROR] OS Error: No such file or directory
[2021-06-11 15:27:22.087427 +0530][Dump ][Utility ] URL: eos05.tier2-kol.res.in:1095
[2021-06-11 15:27:22.087427 +0530][Dump ][Utility ] Protocol: root
[2021-06-11 15:27:22.087427 +0530][Dump ][Utility ] User Name:
[2021-06-11 15:27:22.087427 +0530][Dump ][Utility ] Password:
[2021-06-11 15:27:22.087427 +0530][Dump ][Utility ] Host Name: eos05.tier2-kol.res.in
[2021-06-11 15:27:22.087427 +0530][Dump ][Utility ] Port: 1095
[2021-06-11 15:27:22.087427 +0530][Dump ][Utility ] Path:
[2021-06-11 15:27:22.087677 +0530][Dump ][Utility ] URL: root://eos05.tier2-kol.res.in:1095/
[2021-06-11 15:27:22.087677 +0530][Dump ][Utility ] Protocol: root
[2021-06-11 15:27:22.087677 +0530][Dump ][Utility ] User Name:
[2021-06-11 15:27:22.087677 +0530][Dump ][Utility ] Password:
[2021-06-11 15:27:22.087677 +0530][Dump ][Utility ] Host Name: eos05.tier2-kol.res.in
[2021-06-11 15:27:22.087677 +0530][Dump ][Utility ] Port: 1095
[2021-06-11 15:27:22.087677 +0530][Dump ][Utility ] Path:
[2021-06-11 15:27:22.087894 +0530][Dump ][Utility ] URL: root://eos05.tier2-kol.res.in:1095/
[2021-06-11 15:27:22.087894 +0530][Dump ][Utility ] Protocol: root
[2021-06-11 15:27:22.087894 +0530][Dump ][Utility ] User Name:
[2021-06-11 15:27:22.087894 +0530][Dump ][Utility ] Password:
[2021-06-11 15:27:22.087894 +0530][Dump ][Utility ] Host Name: eos05.tier2-kol.res.in
[2021-06-11 15:27:22.087894 +0530][Dump ][Utility ] Port: 1095
[2021-06-11 15:27:22.087894 +0530][Dump ][Utility ] Path:
[2021-06-11 15:27:22.088115 +0530][Dump ][Utility ] URL: root://eos05.tier2-kol.res.in:1095/
[2021-06-11 15:27:22.088115 +0530][Dump ][Utility ] Protocol: root
[2021-06-11 15:27:22.088115 +0530][Dump ][Utility ] User Name:
[2021-06-11 15:27:22.088115 +0530][Dump ][Utility ] Password:
[2021-06-11 15:27:22.088115 +0530][Dump ][Utility ] Host Name: eos05.tier2-kol.res.in
[2021-06-11 15:27:22.088115 +0530][Dump ][Utility ] Port: 1095
[2021-06-11 15:27:22.088115 +0530][Dump ][Utility ] Path:
[2021-06-11 15:27:22.088382 +0530][Debug ][App ] Executing: stat /
[2021-06-11 15:27:22.088442 +0530][Dump ][App ] Param #00: ‘stat’
[2021-06-11 15:27:22.088483 +0530][Dump ][App ] Param #01: ‘/’
[2021-06-11 15:27:22.088655 +0530][Dump ][FileSystem ] [0xf200f0@eos05.tier2-kol.res.in:1095] Sending kXR_stat (path: /, flags: none)
[2021-06-11 15:27:22.088783 +0530][Debug ][Poller ] Available pollers: built-in
[2021-06-11 15:27:22.088837 +0530][Debug ][Poller ] Attempting to create a poller according to preference: built-in
[2021-06-11 15:27:22.088883 +0530][Debug ][Poller ] Creating poller: built-in
[2021-06-11 15:27:22.088946 +0530][Debug ][Poller ] Creating and starting the built-in poller…
[2021-06-11 15:27:22.089247 +0530][Debug ][Poller ] Using 1 poller threads
[2021-06-11 15:27:22.089301 +0530][Debug ][TaskMgr ] Starting the task manager…
[2021-06-11 15:27:22.089416 +0530][Debug ][TaskMgr ] Task manager started
[2021-06-11 15:27:22.089478 +0530][Debug ][JobMgr ] Starting the job manager…
[2021-06-11 15:27:22.089701 +0530][Debug ][JobMgr ] Job manager started, 3 workers
[2021-06-11 15:27:22.089760 +0530][Debug ][TaskMgr ] Registering task: “FileTimer task” to be run at: [2021-06-11 15:27:22 +0530]
[2021-06-11 15:27:22.089809 +0530][Dump ][XRootD ] [eos05.tier2-kol.res.in:1095] Sending message kXR_stat (path: /, flags: none)
[2021-06-11 15:27:22.089921 +0530][Debug ][ExDbgMsg ] [eos05.tier2-kol.res.in:1095] MsgHandler created: 0xf25380 (message: kXR_stat (path: /, flags: none) ).
[2021-06-11 15:27:22.090067 +0530][Dump ][Utility ] URL: eos05.tier2-kol.res.in:1095
[2021-06-11 15:27:22.090067 +0530][Dump ][Utility ] Protocol: root
[2021-06-11 15:27:22.090067 +0530][Dump ][Utility ] User Name:
[2021-06-11 15:27:22.090067 +0530][Dump ][Utility ] Password:
[2021-06-11 15:27:22.090067 +0530][Dump ][Utility ] Host Name: eos05.tier2-kol.res.in
[2021-06-11 15:27:22.090067 +0530][Dump ][Utility ] Port: 1095
[2021-06-11 15:27:22.090067 +0530][Dump ][Utility ] Path:
[2021-06-11 15:27:22.090286 +0530][Dump ][Utility ] URL: eos05.tier2-kol.res.in:1095
[2021-06-11 15:27:22.090286 +0530][Dump ][Utility ] Protocol: root
[2021-06-11 15:27:22.090286 +0530][Dump ][Utility ] User Name:
[2021-06-11 15:27:22.090286 +0530][Dump ][Utility ] Password:
[2021-06-11 15:27:22.090286 +0530][Dump ][Utility ] Host Name: eos05.tier2-kol.res.in
[2021-06-11 15:27:22.090286 +0530][Dump ][Utility ] Port: 1095
[2021-06-11 15:27:22.090286 +0530][Dump ][Utility ] Path:
[2021-06-11 15:27:22.090470 +0530][Debug ][PostMaster ] Creating new channel to: eos05.tier2-kol.res.in:1095 1 stream(s)
[2021-06-11 15:27:22.090598 +0530][Debug ][PostMaster ] [eos05.tier2-kol.res.in:1095 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Window: 1800
[2021-06-11 15:27:22.090731 +0530][Debug ][TaskMgr ] Registering task: “TickGeneratorTask for: eos05.tier2-kol.res.in:1095” to be run at: [2021-06-11 15:27:37 +0530]
[2021-06-11 15:27:22.090803 +0530][Dump ][PostMaster ] [eos05.tier2-kol.res.in:1095 #0] Sending message kXR_stat (path: /, flags: none) (0xf203c0) through substream 0 expecting answer at 0
[2021-06-11 15:27:22.090972 +0530][Debug ][PostMaster ] [eos05.tier2-kol.res.in:1095] Found 1 address(es): [::ffff:144.16.112.19]:1095
[2021-06-11 15:27:22.091066 +0530][Debug ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Attempting connection to [::ffff:144.16.112.19]:1095
[2021-06-11 15:27:22.091210 +0530][Debug ][Poller ] Adding socket 0xf26110 to the poller
[2021-06-11 15:27:22.091463 +0530][Debug ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Async connection call returned
[2021-06-11 15:27:22.091610 +0530][Debug ][XRootDTransport ] [eos05.tier2-kol.res.in:1095 #0.0] Sending out the initial hand shake + kXR_protocol
[2021-06-11 15:27:22.091729 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Wrote a message: (0x58000950), 44 bytes
[2021-06-11 15:27:22.091890 +0530][Dump ][XRootDTransport ] [msg: 0x58000950] Expecting 8 bytes of message body
[2021-06-11 15:27:22.091966 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received message header, size: 8
[2021-06-11 15:27:22.092006 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received a message of 16 bytes
[2021-06-11 15:27:22.092045 +0530][Debug ][XRootDTransport ] [eos05.tier2-kol.res.in:1095 #0.0] Got the server hand shake response (type: server [], protocol version 400)
[2021-06-11 15:27:22.092080 +0530][Dump ][XRootDTransport ] [msg: 0x58000950] Expecting 8 bytes of message body
[2021-06-11 15:27:22.092097 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received message header, size: 8
[2021-06-11 15:27:22.092118 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received a message of 16 bytes
[2021-06-11 15:27:22.092140 +0530][Debug ][XRootDTransport ] [eos05.tier2-kol.res.in:1095 #0.0] kXR_protocol successful (type: server [], protocol version 400)
[2021-06-11 15:27:22.092287 +0530][Debug ][XRootDTransport ] [eos05.tier2-kol.res.in:1095 #0.0] Sending out kXR_login request, username: root, cgi: ?xrd.cc=in&xrd.tz=5&xrd.appname=xrdfs&xrd.info=&xrd.hostname=eos-mgm.tier2-kol.res.in&xrd.rn=v4.12.8, dual-stack: true, private IPv4: false, private IPv6: false
[2021-06-11 15:27:22.092351 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Wrote a message: (0x58000a60), 124 bytes
[2021-06-11 15:27:22.092438 +0530][Dump ][XRootDTransport ] [msg: 0x58000950] Expecting 50 bytes of message body
[2021-06-11 15:27:22.092461 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received message header, size: 8
[2021-06-11 15:27:22.092482 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received a message of 58 bytes
[2021-06-11 15:27:22.092511 +0530][Debug ][XRootDTransport ] [eos05.tier2-kol.res.in:1095 #0.0] Logged in, session: 8059010026280000a0000000825a0100
[2021-06-11 15:27:22.092529 +0530][Debug ][XRootDTransport ] [eos05.tier2-kol.res.in:1095 #0.0] Authentication is required: &P=unix&P=sss,0.13:/etc/eos.keytab
[2021-06-11 15:27:22.092564 +0530][Debug ][XRootDTransport ] [eos05.tier2-kol.res.in:1095 #0.0] Sending authentication data
[2021-06-11 15:27:22.093422 +0530][Debug ][XRootDTransport ] [eos05.tier2-kol.res.in:1095 #0.0] Trying to authenticate using unix
[2021-06-11 15:27:22.093790 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Wrote a message: (0x58000db0), 39 bytes
[2021-06-11 15:27:22.093993 +0530][Dump ][XRootDTransport ] [msg: 0x58000950] Expecting 0 bytes of message body
[2021-06-11 15:27:22.094042 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received message header, size: 8
[2021-06-11 15:27:22.094060 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received a message of 8 bytes
[2021-06-11 15:27:22.094091 +0530][Debug ][XRootDTransport ] [eos05.tier2-kol.res.in:1095 #0.0] Authenticated with unix.
[2021-06-11 15:27:22.094120 +0530][Debug ][PostMaster ] [eos05.tier2-kol.res.in:1095 #0] Stream 0 connected.
[2021-06-11 15:27:22.094142 +0530][Debug ][Utility ] Monitor library name not set. No monitoring
[2021-06-11 15:27:22.094205 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Wrote a message: kXR_stat (path: /, flags: none) (0xf203c0), 25 bytes
[2021-06-11 15:27:22.094227 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Successfully sent message: kXR_stat (path: /, flags: none) (0xf203c0).
[2021-06-11 15:27:22.094251 +0530][Dump ][XRootD ] [eos05.tier2-kol.res.in:1095] Message kXR_stat (path: /, flags: none) has been successfully sent.
[2021-06-11 15:27:22.094268 +0530][Debug ][ExDbgMsg ] [eos05.tier2-kol.res.in:1095] Moving MsgHandler: 0xf25380 (message: kXR_stat (path: /, flags: none) ) from out-queu to in-queue.
[2021-06-11 15:27:22.094312 +0530][Dump ][PostMaster ] [eos05.tier2-kol.res.in:1095 #0.0] All messages consumed, disable uplink
[2021-06-11 15:27:22.094345 +0530][Dump ][XRootDTransport ] [msg: 0x58000950] Expecting 31 bytes of message body
[2021-06-11 15:27:22.094362 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received message header for 0x58000950 size: 8
[2021-06-11 15:27:22.094397 +0530][Dump ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Received message 0x58000950 of 39 bytes
[2021-06-11 15:27:22.094418 +0530][Dump ][PostMaster ] [eos05.tier2-kol.res.in:1095 #0] Handling received message: 0x58000950.
[2021-06-11 15:27:22.094561 +0530][Dump ][XRootD ] [eos05.tier2-kol.res.in:1095] Got a kXR_ok response to request kXR_stat (path: /, flags: none)
[2021-06-11 15:27:22.094624 +0530][Debug ][ExDbgMsg ] [eos05.tier2-kol.res.in:1095] Calling MsgHandler: 0xf25380 (message: kXR_stat (path: /, flags: none) ) with status: [SUCCESS] .
[2021-06-11 15:27:22.094662 +0530][Dump ][XRootD ] [eos05.tier2-kol.res.in:1095] Parsing the response to kXR_stat (path: /, flags: none) as StatInfo: 10930708327 4096 19 1622793375
[2021-06-11 15:27:22.094728 +0530][Debug ][ExDbgMsg ] [eos05.tier2-kol.res.in:1095] Destroying MsgHandler: 0xf25380.
Path: /
Id: 10930708327
Size: 4096
MTime: 2021-06-04 07:56:15
Flags: 19 (XBitSet|IsDir|IsReadable)
[2021-06-11 15:27:22.095080 +0530][Debug ][JobMgr ] Stopping the job manager…
[2021-06-11 15:27:22.095150 +0530][Dump ][JobMgr ] Stopping worker #0…
[2021-06-11 15:27:22.095424 +0530][Dump ][JobMgr ] Worker #0 stopped
[2021-06-11 15:27:22.095471 +0530][Dump ][JobMgr ] Stopping worker #1…
[2021-06-11 15:27:22.095608 +0530][Dump ][JobMgr ] Worker #1 stopped
[2021-06-11 15:27:22.095660 +0530][Dump ][JobMgr ] Stopping worker #2…
[2021-06-11 15:27:22.095767 +0530][Dump ][JobMgr ] Worker #2 stopped
[2021-06-11 15:27:22.095819 +0530][Debug ][JobMgr ] Job manager stopped
[2021-06-11 15:27:22.095863 +0530][Debug ][TaskMgr ] Stopping the task manager…
[2021-06-11 15:27:22.096048 +0530][Debug ][TaskMgr ] Task manager stopped
[2021-06-11 15:27:22.096089 +0530][Debug ][Poller ] Stopping the poller…
[2021-06-11 15:27:22.096255 +0530][Debug ][TaskMgr ] Requesting unregistration of: “TickGeneratorTask for: eos05.tier2-kol.res.in:1095”
[2021-06-11 15:27:22.096312 +0530][Debug ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Closing the socket
[2021-06-11 15:27:22.096366 +0530][Debug ][Poller ] <[::ffff:144.16.112.14]:57556><–><[::ffff:144.16.112.19]:1095> Removing socket from the poller
[2021-06-11 15:27:22.096467 +0530][Debug ][PostMaster ] [eos05.tier2-kol.res.in:1095 #0] Destroying stream
[2021-06-11 15:27:22.096521 +0530][Debug ][AsyncSock ] [eos05.tier2-kol.res.in:1095 #0.0] Closing the socket
[root@eos-mgm ~]#

Suggest.

Regards
Prasun

prasun · June 28, 2021, 12:43pm

Dear Andreas and EOS Experts,

Any update for this issue.

Actually Kolkata EOS2 are not online since more then One month.

Regards
Prasun

apeters · June 28, 2021, 1:31pm

Any chance, that you can add my public key to your MGM and let me log-in there?

It is too difficult to debug this via the forum …

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAwOBktEHxAS9KAJkzFfnrsOwqs5HdEarWIP+5YK6a86R1A3OyzdzQ2TWNRM/yFqo3+LuQ9vnIsNc1xgcbGmEWSH82YVC2WCvIsuvPmdfp3rA5NQnUsqOuoYaqqtWtcneFWTqPWKCkI9uB9ZQjSKv9XAU/SnpsYYTgFsr/Fc/AnIp1UFcReG1wurC8+OSXJFBRUzMkR2APVXtjmtpIo/nRED3vr0vNdAQVGJI22rohfn6ShORwmlQ1lWBbAGq6f/HxrUPiIj7IEeC4pEfobCtCu4eF/3cUq7NnYMpWTiubXr1Y+I9JTM4INGuc1rp6MgyUC36KRODctcK40VNu8ZUOdQ== apeters@lxplus256.cern.ch

prasun · June 29, 2021, 7:34am

Dear Andreas,

Your public key is added to “eos-mgm.tier2-kol.res.in” (MASTER and Member of QDB) and “eos-qdb.tier2-kol.res.in” (Member of QDB).
Alias name of eos-mgm.tier2-kol.res.in and eos-slave.tier2-kol.res.in are eoskolkata.tier2-kol.res.in.

Let me know that have you able to login those machines?

Regards
Prasun

apeters · June 29, 2021, 12:03pm

Perfect, I am in!

CERN Accelerating science

Missing and Unable tor Restore/Repair Replica in RAIN6 (Unable to schedule stripes for reconstruction; could not place new replica; replica inconsistency repair failed and No space left on device)

[root@eos-mgm ~]# eos ls -la /eos/alicekolkata/grid