No space left on device when creating files larger than 127M

risson · January 18, 2024, 7:11am

I’ve been setting up EOS in standalone mode (i.e. only one instance of every service) for testing purposes, although in the long run I plan to add more FSTs. To try stuff out I decided to copy a bunch of files to it, and I chose my entire ~/git folder for this.

I started seeing weird errors from eosxd3 about no space being left on device. At first I thought that was caused by the cache partition not being big enough but that was not it.

240118 07:03:39 t=1705561419.778166 f=HandleResponse   l=CRIT  tid=00007f5d7dbfa700 s=xrdclproxy:922           write error '[ERROR] Operation expired'
                 ---- high rate error messages suppressed ----
240118 07:03:39 t=1705561419.788031 f=flush_nolock     l=ERROR ino:00004f6d10000000 s=data:373                 write error error=[ERROR] Operation expired
240118 07:03:39 t=1705561419.788050 f=TryRecovery      l=CRIT  ino:00004f6d10000000 s=data:1103                triggering write recovery state = 2
240118 07:03:39 t=1705561419.788556 f=lookupNonLocalJail l=ALERT tid=00007f5d81bff700 s=SecurityChecker:212      Failed to openat file
                 ---- high rate error messages suppressed ----
240118 07:03:39 t=1705561419.788702 f=lookupNonLocalJail l=ALERT tid=00007f5d81bff700 s=SecurityChecker:198      Failed to openat next child
240118 07:03:39 t=1705561419.949411 f=recover_write    l=WARN  ino:00004f6d10000000 s=data:1887                re-opening with repair flag for recovery root://AAAAAAAE@mgm.services-eos.svc.c.k3s.fsn.lama.tel:1094//fusex-open?eos.app=fuse&eos.bookingsize=0&eos.lfn=ino:4f6d10000000&fuse.exe=rsync&fuse.gid=1000&fuse.pid=3944602&fuse.uid=1000&fuse.ver=5.2.4&mgm.fusex=1&mgm.mtime=0&xrd.k5ccname=KEYRING:persistent:1000&xrd.wantprot=krb5,unix&xrdcl.secgid=1000&xrdcl.secuid=1000&eos.repair=1&eos.bookingsize=0
240118 07:03:40 t=1705561420.007886 f=HandleResponseWithHosts l=ERROR tid=00007f5d7b3f5700 s=xrdclproxy:565           state=failed async open returned errmsg=[ERROR] Error response: no space left on device

After further investigation, I’m getting the following in the MGM logs:

240117 11:59:25 time=1705492765.562852 func=IdMap                    level=INFO  logid=static.............................. unit=mgm@mgm-0.mgm.services-eos.svc.c.k3s.fsn.lama.tel:1094 tid=00007fac5c3ff700 source=Mapping:1001                   tident= sec=(null) uid=99 gid=99 name=- geo="" sec.prot=krb5 sec.name="risson" sec.host="[2a01:cb10:944:3200:f40:85f0:9fec:fa6a]" sec.vorg="" sec.grps="" sec.role="" sec.info="" sec.app="fuse" sec.tident="AAAAAAAE.685048:455@[2a01:cb10:944:3200:f40:85f0:9fec:fa6a]" vid.uid=2004 vid.gid=2004 sudo=0 gateway=0
240117 11:59:25 time=1705492765.562968 func=open                     level=INFO  logid=dc16ad9e-b52f-11ee-b4be-fe8bf25c6639 unit=mgm@mgm-0.mgm.services-eos.svc.c.k3s.fsn.lama.tel:1094 tid=00007fac5c3ff700 source=XrdMgmOfsFile:587              tident=AAAAAAAE.685048:455@[2a01:cb10:944:3200:f40:85f0:9fec:fa6a] sec=krb5  uid=2004 gid=2004 name=risson geo="" msg="access by inode" ino=ino:3560000000 path=/eos/user/risson/bench/cern/dss/eos/.git/objects/pack/.pack-4aaab988483f984e09b8b83c8fded69bd59d6859.pack.iyyhWs
240117 11:59:25 time=1705492765.563031 func=open                     level=INFO  logid=dc16ad9e-b52f-11ee-b4be-fe8bf25c6639 unit=mgm@mgm-0.mgm.services-eos.svc.c.k3s.fsn.lama.tel:1094 tid=00007fac5c3ff700 source=XrdMgmOfsFile:1108             tident=AAAAAAAE.685048:455@[2a01:cb10:944:3200:f40:85f0:9fec:fa6a] sec=krb5  uid=2004 gid=2004 name=risson geo="" acl=1 r=1 w=1 wo=0 egroup=0 shared=0 mutable=1 facl=0
240117 11:59:25 time=1705492765.563039 func=open                     level=INFO  logid=dc16ad9e-b52f-11ee-b4be-fe8bf25c6639 unit=mgm@mgm-0.mgm.services-eos.svc.c.k3s.fsn.lama.tel:1094 tid=00007fac5c3ff700 source=XrdMgmOfsFile:1186             tident=AAAAAAAE.685048:455@[2a01:cb10:944:3200:f40:85f0:9fec:fa6a] sec=krb5  uid=2004 gid=2004 name=risson geo="" msg="client acting as directory owner" path="/eos/user/risson/bench/cern/dss/eos/.git/objects/pack/.pack-4aaab988483f984e09b8b83c8fded69bd59d6859.pack.iyyhWs" uid="2004=>2004" gid="2004=>2004"
240117 11:59:25 time=1705492765.563161 func=open                     level=INFO  logid=dc16ad9e-b52f-11ee-b4be-fe8bf25c6639 unit=mgm@mgm-0.mgm.services-eos.svc.c.k3s.fsn.lama.tel:1094 tid=00007fac5c3ff700 source=XrdMgmOfsFile:2247             tident=AAAAAAAE.685048:455@[2a01:cb10:944:3200:f40:85f0:9fec:fa6a] sec=krb5  uid=2004 gid=2004 name=risson geo="" msg="file-recreation due to offline/full locations" path=/eos/user/risson/bench/cern/dss/eos/.git/objects/pack/.pack-4aaab988483f984e09b8b83c8fded69bd59d6859.pack.iyyhWs retc=28
240117 11:59:25 time=1705492765.563216 func=Emsg                     level=ERROR logid=dc16ad9e-b52f-11ee-b4be-fe8bf25c6639 unit=mgm@mgm-0.mgm.services-eos.svc.c.k3s.fsn.lama.tel:1094 tid=00007fac5c3ff700 source=XrdMgmOfsFile:3527             tident=AAAAAAAE.685048:455@[2a01:cb10:944:3200:f40:85f0:9fec:fa6a] sec=krb5  uid=2004 gid=2004 name=risson geo="" Unable to get free physical space /eos/user/risson/bench/cern/dss/eos/.git/objects/pack/.pack-4aaab988483f984e09b8b83c8fded69bd59d6859.pack.iyyhWs; No space left on device

Trying to reproduce further, I found that running the copy with rsync --max-size 100M did not trigger the issue. It turns out that if I copy a file bigger than 127M, then it fails:

# Works
size=127; dd if=/dev/urandom of=/tmp/test bs=1M count=$size; rsync -avP /tmp/test /eos/user/risson/test
# Fails with the error above
size=128; dd if=/dev/urandom of=/tmp/test bs=1M count=$size; rsync -avP /tmp/test /eos/user/risson/test

Now, this only happens when using eosxd3 (or eosxd for that matter). When running eos cp directly on the mgm node, this is working, but I don’t see anything weird in the fuse config I’m using (details below).

# Works fine
[root@mgm-0]# size=200; dd if=/dev/urandom of=/tmp/test bs=1M count=$size; eos cp /tmp/test /eos/user/risson/bench/test

I checked a lot of topic on this forum and from all the troubleshooting steps I’ve found, everything looks fine:

# eos space ls
│type      │            name│   groupsize│    groupmod│ N(fs)│ N(fs-rw)│ sum(usedbytes)│ sum(capacity)│ capacity(rw)│ nom.capacity│sched.capacity│ usage│ quota│ balancing│  threshold│  converter│   ntx│  active│        wfe│   ntx│  active│ intergroup│
 spaceview           default            0           24      1         1        20.09 GB       52.52 GB      52.52 GB       1.00 TB       32.43 GB   2.01    off        off          20         off      2        0         off      1        0         off
# eos fs ls
│host                                         │port│    id│                            path│      schedgroup│          geotag│        boot│  configstatus│       drain│ usage│  active│          health│
 fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel 1095      1                     /fst_storage        default.0         fsn::k3s       booted             rw      nodrain  38.07   online      no smartctl
# eos fusex ls
client : eosxd                               horse 5.2.4    online   Wed, 17 Jan 2024 10:38:09 GMT 4.41 7.57 8178302a-b524-11ee-91c3-50ebf6559d17 p=685048 caps=0 fds=0 static [vacant] act mount=/eos prot=4 app=fuse
client : eosxd                  k3s-1.fsn.lama.tel 5.2.4    online   Wed, 17 Jan 2024 08:35:40 GMT 9.23 0.70 659d0300-b513-11ee-9b54-525400f55a2a p=1212146 caps=0 fds=1 static [vacant] act mount=/eos prot=4 app=fuse

I don’t have any specific quota or recycle configuration (yet).

Fuse config:

{
  "name": "eos",
  "hostport": "mgm.services-eos.svc.c.k3s.fsn.lama.tel",
  "remotemountdir": "/eos",
  "localmountdir": "/eos",
  "auth": {
    "shared-mount": 1,
    "sss": 0,
    "gsi-first": 0,
    "krb5": 1,
    "oauth2": 0
  },
  "options": {
    "hide-versions": 0
  }
}

Mounted with /usr/bin/eosxd3 -f -ofsname=eos.

Now, all of this is either running via docker or k8s, using gitlab-registry.cern.ch/dss/eos/eos-ci:5.2.4.

The eosxd3 on which this is happening is ran with docker run --rm -it --name eos-fuse -e EOS_FUSE_KERNELCACHE=0 -e EOS_FUSE_CACHE=0 --entrypoint="" -v /etc/krb5.conf:/etc/krb5.conf:ro -v /persist/var/cache/eos:/var/cache/eos --mount=type=bind,source=/eos,target=/eos,bind-propagation=shared -v $PWD/fuse.eos.conf:/etc/eos/fuse.eos.conf -v /dev/fuse:/dev/fuse -e EOS_MGM_URL=root://mgm.services-eos.svc.c.k3s.fsn.lama.tel --privileged --network=host --pid=host --uts=host --ipc=host gitlab-registry.cern.ch/dss/eos/eos-ci:5.2.4 /usr/bin/eosxd3 -f -ofsname=eos

If you need to look at a specific mgm/fst/other configuration, they’re all public at k8s/services/eos/app · 20049b69682a0765a70bda06d06e1f6da5199aa8 · lama-corp / Lama Corp. Infrastructure / Infrastructure · GitLab

Thanks in advance for your help

apeters · January 18, 2024, 9:54am

This comes from this in the eosxd code:

    if (best_io_buffer_size > 128 * 1024 * 1024) {
      best_io_buffer_size = 128 * 1024 * 1024;

The problem is most likelye that your mount client cannot reach the FST node to actually push data there!
You should check that the connection from the client to the FST on port 1095 is actually possible.

risson · January 18, 2024, 11:02am

Indeed, it only seems to happen on a machine with 128G of RAM. I’ll need to look into the cache.max-read-ahead-buffer and cache.max-write-buffer options.

The client definitely has access to the FST though, as I can write small files without any issue.

Thanks again for your swift answer!

apeters · January 18, 2024, 2:42pm

No no,
that is the wrong conclusion.

If you don’t force fdatasync in POSIX, the only thing happening is that you create a meta-data entry in the MGM on open, then the writes flow into the journal on the mount client. The client tries to open the file on one FSTs ‘later’ and tries to write to the FST. If this does not work, it will try to go back to the MGM and tries to find a new location. This fails in your case with no space left on device because there are no alternative FSTs to use. So you have to check if your client ever connects to the FST by looking at the log /var/log/eos/fst/xrdlog.fst
Either you dont’ see the client connecting or if it does you should find an error message!

risson · January 20, 2024, 4:07am

Yes I see, thanks for the explanation.

I didn’t include any FST logs in my initial message, because I don’t see any error message in there. I indeed see the client connecting:

fst-0 fst 240120 04:01:13 time=1705723273.707967 func=open                     level=INFO  logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:176              tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=unix  uid=0 gid=0 name=root geo="" path=/fusex-open info=cap.msg=<...>&cap.sym=<...>&eos.app=fuse&eos.bookingsize=0&eos.checksum=ignore&eos.clientinfo=zbase64:MDAwMDAwODF4nBWJUQrDIBAFr9ILZLENJVTYwxh9EqlVcTcpvX315zFvpjYU25VXemy3VO3+5VTiJPEHApuJ+mvg/YwRHWGK1lPlgOjOrPMPnTq8chRdDI0lQb+ShyyoQnJ58vReZaRC2X0cKbK9m9fzD5c8LXc=&eos.lfn=ino:4f9960000000&fuse.exe=rsync&fuse.gid=1000&fuse.pid=1189265&fuse.uid=1000&fuse.ver=5.2.4&mgm.checksum=ignore&mgm.fusex=1&mgm.id=0004f996&mgm.logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639&mgm.mtime=0&mgm.replicahead=0&mgm.replicaindex=0&xrd.k5ccname=KEYRING:persistent:1000&xrd.wantprot=krb5,unix open_mode=0
fst-0 fst 240120 04:01:13 time=1705723273.708785 func=ProcessCapOpaque         level=INFO  logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:2733             tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=(null) uid=99 gid=99 name=(null) geo="" capability=&mgm.access=read&mgm.ruid=2004&mgm.rgid=2004&mgm.uid=99&mgm.gid=99&mgm.path=/eos/user/risson/bench/test&mgm.manager=mgm-0.mgm.services-eos.svc.c.k3s.fsn.lama.tel:1094&mgm.fid=0004f996&mgm.cid=513&mgm.sec=krb5|risson|[2a01:cb10:944:3200:a866:ed33:608d:b1e]|||||fuse&mgm.lid=1048578&mgm.bookingsize=0&mgm.fsid=1&cap.valid=1705726873
fst-0 fst 240120 04:01:13 time=1705723273.708837 func=open                     level=INFO  logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:255              tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=(null) uid=2004 gid=2004 name=nobody geo="" ns_path=/fusex-open fst_path=/fst_storage/00000020/0004f996
fst-0 fst 240120 04:01:13 time=1705723273.709913 func=open                     level=INFO  logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:514              tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=(null) uid=2004 gid=2004 name=nobody geo="" fst_path=/fst_storage/00000020/0004f996 open-mode=0 create-mode=1c0 layout-name=plain oss-opaque=&mgm.lid=1048578&mgm.bookingsize=0
fst-0 fst 240120 04:01:13 time=1705723273.709926 func=fileOpen                 level=INFO  logid=8da69640-b748-11ee-8905-7a3198140754 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=LocalIo:70                     tident=<service> sec=      uid=0 gid=0 name= geo="" flags=0 path=/fst_storage/00000020/0004f996
fst-0 fst 240120 04:01:13 time=1705723273.711076 func=open                     level=INFO  logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:695              tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=(null) uid=2004 gid=2004 name=nobody geo="" msg="layout size" disk_size=524288000 db_size= 524288000
fst-0 fst 240120 04:01:13 time=1705723273.711125 func=open                     level=INFO  logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:782              tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=(null) uid=2004 gid=2004 name=nobody geo="" open-duration=3.176ms path='/fusex-open' fxid=0004f996 path::print=0.903ms creation::barrier=0.896ms layout::exists=0.127ms clone::fst=0.012ms layout::open=0.052ms layout::opened=0.481ms get::localfmd=0.000ms resync::localfmd=0.666ms layout::stat=0.000ms layout::stat=0.001ms fileio::object=0.032ms open::accounting=0.001ms end=0.005ms open=3.176ms
fst-0 fst 240120 04:01:13 time=1705723273.711148 func=stat                     level=INFO  logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:1304             tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=      uid=2004 gid=2004 name=nobody geo="" path=/fusex-open fxid=0004f996 size=524288000 mtime=1705723112.78750800
fst-0 fst 240120 04:01:13 time=1705723273.711232 func=open                     level=INFO  logid=8d957932-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:176              tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=unix  uid=0 gid=0 name=root geo="" path=/fusex-open info=cap.msg=<...>&cap.sym=<...>&eos.app=fuse&eos.bookingsize=0&eos.clientinfo=zbase64:MDAwMDAwODF4nBWJUQ7CIBAFr+IFuqFqPyTZw1B4pEQEwm5rvL3w8zJvpjYU25VXuj9vqdr9y6nESeIPBDYT9dfA+xkjOsIUrafKAdGdWecfOnV45Si6GBpLgn4lD1lQheTy5On9kJEKZfdxpMh2Na/tD5ToLXI=&eos.lfn=ino:4f9970000000&fuse.exe=rsync&fuse.gid=1000&fuse.pid=1189265&fuse.uid=1000&fuse.ver=5.2.4&mgm.fusex=1&mgm.id=0004f997&mgm.logid=8d957932-b748-11ee-94f4-fe8bf25c6639&mgm.mtime=0&mgm.replicahead=0&mgm.replicaindex=0&xrd.k5ccname=KEYRING:persistent:1000&xrd.wantprot=krb5,unix open_mode=2
fst-0 fst 240120 04:01:13 time=1705723273.711323 func=ProcessCapOpaque         level=INFO  logid=8d957932-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:2733             tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=(null) uid=99 gid=99 name=(null) geo="" capability=&mgm.access=update&mgm.ruid=2004&mgm.rgid=2004&mgm.uid=99&mgm.gid=99&mgm.path=/eos/user/risson/bench/.test.J1gaaC&mgm.manager=mgm-0.mgm.services-eos.svc.c.k3s.fsn.lama.tel:1094&mgm.fid=0004f997&mgm.cid=513&mgm.sec=krb5|risson|[2a01:cb10:944:3200:a866:ed33:608d:b1e]|||||fuse&mgm.lid=1048578&mgm.bookingsize=0&mgm.fsid=1&cap.valid=1705726873
fst-0 fst 240120 04:01:13 time=1705723273.711352 func=open                     level=INFO  logid=8d957932-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:255              tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=(null) uid=2004 gid=2004 name=nobody geo="" ns_path=/fusex-open fst_path=/fst_storage/00000020/0004f997
fst-0 fst 240120 04:01:13 time=1705723273.711697 func=open                     level=INFO  logid=8d957932-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:514              tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=(null) uid=2004 gid=2004 name=nobody geo="" fst_path=/fst_storage/00000020/0004f997 open-mode=102 create-mode=41c0 layout-name=plain oss-opaque=&mgm.lid=1048578&mgm.bookingsize=0
fst-0 fst 240120 04:01:13 time=1705723273.711709 func=fileOpen                 level=INFO  logid=8da6d6aa-b748-11ee-8905-7a3198140754 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=LocalIo:70                     tident=<service> sec=      uid=0 gid=0 name= geo="" flags=102 path=/fst_storage/00000020/0004f997
fst-0 fst 240120 04:01:13 time=1705723273.713283 func=open                     level=INFO  logid=8d957932-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:782              tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=(null) uid=2004 gid=2004 name=nobody geo="" open-duration=2.051ms path='/fusex-open' fxid=0004f997 path::print=0.129ms creation::barrier=0.029ms layout::exists=0.003ms clone::fst=0.307ms layout::open=0.017ms layout::opened=0.856ms get::localfmd=0.001ms resync::localfmd=0.643ms layout::stat=0.005ms fileio::object=0.050ms open::accounting=0.008ms end=0.003ms open=2.051ms
fst-0 fst 240120 04:01:13 time=1705723273.713298 func=stat                     level=INFO  logid=8d957932-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0f7fe700 source=XrdFstOfsFile:1304             tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=      uid=2004 gid=2004 name=nobody geo="" path=/fusex-open fxid=0004f997 size=0 mtime=1705723273.35558205
# Hang on the client and then error
fst-0 fst 240120 04:02:13 time=1705723333.731417 func=_close                   level=INFO  logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0cfff700 source=XrdFstOfsFile:1535             tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=      uid=2004 gid=2004 name=nobody geo="" viaDelete=0 writeDelete=0
fst-0 fst 240120 04:02:13 time=1705723333.734506 func=_close                   level=INFO  logid=static.............................. unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0cfff700 source=XrdFstOfsFile:2019             tident= sec=(null) uid=99 gid=99 name=- geo="" msg="log=8d8f532c-b748-11ee-94f4-fe8bf25c6639&path=/eos/user/risson/bench/test&fstpath=/fst_storage/00000020/0004f996&ruid=2004&rgid=2004&td=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e]&host=fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel&lid=1048578&fid=326038&fsid=1&ots=1705723273&otms=707&cts=1705723333&ctms=732&nrc=0&nwc=0&rb=0&rb_min=0&rb_max=0&rb_sigma=0.00&rv_op=0&rvb_min=0&rvb_max=0&rvb_sum=0&rvb_sigma=0.00&rs_op=0&rsb_min=0&rsb_max=0&rsb_sum=0&rsb_sigma=0.00&rc_min=0&rc_max=0&rc_sum=0&rc_sigma=0.00&wb=0&wb_min=0&wb_max=0&wb_sigma=0.00&sfwdb=0&sbwdb=0&sxlfwdb=0&sxlbwdb=0&nfwds=0&nbwds=0&nxlfwds=0&nxlbwds=0&usage=0.01&iot=60024.230&idt=60018.445&lrt=0.000&lrvt=0.000&lwt=0.000&ot=3.176&ct=2.609&rt=0.00&rvt=0.00&wt=0.00&osize=524288000&csize=524288000&delete_on_close=0&prio_c=2&prio_l=4&prio_d=1&forced_bw=0&ms_sleep=0&ior_err=0&iow_err=0&sec.prot=krb5&sec.name=risson&sec.host=[2a01:cb10:944:3200:a866:ed33:608d:b1e]&sec.vorg=&sec.grps=&sec.role=&sec.info=&sec.app=fuse"
fst-0 fst 240120 04:02:13 time=1705723333.734530 func=_close                   level=INFO  logid=8d8f532c-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0cfff700 source=XrdFstOfsFile:2250             tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=      uid=2004 gid=2004 name=nobody geo="" msg="done close" rc=0 errc=0
fst-0 fst 240120 04:02:14 time=1705723334.756632 func=_close                   level=INFO  logid=8d957932-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0cfff700 source=XrdFstOfsFile:1535             tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=      uid=2004 gid=2004 name=nobody geo="" viaDelete=0 writeDelete=0
fst-0 fst 240120 04:02:14 time=1705723334.756909 func=VerifyChecksum           level=INFO  logid=8d957932-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0cfff700 source=XrdFstOfsFile:3606             tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=      uid=2004 gid=2004 name=nobody geo="" (write) checksum type: adler checksum hex: 2266fdb1 requested-checksum hex: -none-
fst-0 fst 240120 04:02:14 time=1705723334.761432 func=_close                   level=INFO  logid=static.............................. unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0cfff700 source=XrdFstOfsFile:2019             tident= sec=(null) uid=99 gid=99 name=- geo="" msg="log=8d957932-b748-11ee-94f4-fe8bf25c6639&path=/eos/user/risson/bench/.test.J1gaaC&fstpath=/fst_storage/00000020/0004f997&ruid=2004&rgid=2004&td=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e]&host=fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel&lid=1048578&fid=326039&fsid=1&ots=1705723273&otms=711&cts=1705723334&ctms=761&nrc=0&nwc=5&rb=0&rb_min=0&rb_max=0&rb_sigma=0.00&rv_op=0&rvb_min=0&rvb_max=0&rvb_sum=0&rvb_sigma=0.00&rs_op=0&rsb_min=0&rsb_max=0&rsb_sum=0&rsb_sigma=0.00&rc_min=0&rc_max=0&rc_sum=0&rc_sigma=0.00&wb=1310720&wb_min=262144&wb_max=262144&wb_sigma=0.00&sfwdb=0&sbwdb=0&sxlfwdb=0&sxlbwdb=0&nfwds=0&nbwds=0&nxlfwds=0&nxlbwds=0&usage=0.02&iot=61050.168&idt=61039.945&lrt=0.000&lrvt=0.000&lwt=3.362&ot=2.051&ct=4.809&rt=0.00&rvt=0.00&wt=1.89&osize=0&csize=1310720&delete_on_close=0&prio_c=2&prio_l=4&prio_d=1&forced_bw=0&ms_sleep=0&ior_err=0&iow_err=0&sec.prot=krb5&sec.name=risson&sec.host=[2a01:cb10:944:3200:a866:ed33:608d:b1e]&sec.vorg=&sec.grps=&sec.role=&sec.info=&sec.app=fuse"
fst-0 fst 240120 04:02:14 time=1705723334.761447 func=_close                   level=INFO  logid=8d957932-b748-11ee-94f4-fe8bf25c6639 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a0cfff700 source=XrdFstOfsFile:2250             tident=AAAAAAAE.1180594:57@[2a01:cb10:944:3200:a866:ed33:608d:b1e] sec=      uid=2004 gid=2004 name=nobody geo="" msg="done close" rc=0 errc=0
fst-0 fst 240120 04:02:19 time=1705723339.284873 func=_rem                     level=INFO  logid=3a2e6480-b52b-11ee-9ba3-7a3198140754 unit=fst@fst-0.fst.services-eos.svc.c.k3s.fsn.lama.tel:1095 tid=00007f7a13fff700 source=XrdFstOfs:1273                 tident=<service> sec=      uid=0 gid=0 name= geo="" fstpath=/fst_storage/00000020/0004f996

tcpdumping the traffic, I can indeed see quite a few TCP messages being exchanged with the FST, so I still doubt that the issue is a network one. Also, running a VM on that same client and trying on there works w/o any problem, so the issue is specific to this client.

I’ll try a few more things to see if I can find a specific difference between the clients where I don’t encounter this issue, and the one where I do.

Thanks again for your help!

CERN Accelerating science

No space left on device when creating files larger than 127M