Hello Andreas I am return to the issue with some finding
on eos, we have a mismatch on userspace as calculated directly from fst via statvfs() calls and quota accounting for the default space
sum.stat.statfs.usedbytes=4901489687896064
where the quota report is “usedsize” : 4026197504675803
they are missing 875,292,183,220,261 bytes !
We spot that the apparent sizes of some files are smallest than the disk sizes (allocated size)
( on metadata db or rucio db the file size corresponds to the apparent size, the same is measured by the quota eos system).
This is due to speculative preallocation to allocate blocks past EOF
those could be verified as
you could connect to any fst
and go to any partition
and find some files with large sizes and difference in du ; du --apparent-size
and run
xfs_bmap -pvv /fspool/disk06/0000083c/0141cc0d
/fspool/disk06/0000083c/0141cc0d:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0…7]: 41432754000…41432754007 25 (15653200…15653207) 8 001111
1: [8…603999]: 43036269056…43036873047 25 (1619168256…1619772247) 603992 000111
2: [604000…1026047]: 43036873048…43037295095 25 (1619772248…1620194295) 422048 011111
FLAG Values:
0100000 Shared extent
0010000 Unwritten preallocated extent
0001000 Doesn’t begin on stripe unit
0000100 Doesn’t end on stripe unit
0000010 Doesn’t begin on stripe width
0000001 Doesn’t end on stripe width
the last block of 422048*512=216,088,576 correspond to Unwritten preallocated extent
for each affected file this space can be recovered only if the system reclaims the inode , (e.g. delete the file) or run the defragment process on the file.
The root cause is , in a timeout window of 5 minutes the inodes of some files appeared “dirty” in the vfs cache and the kernel can not remove the preallocated extent , and stay there forever after the expiration of the timeout We are running on CentOS Stream release 8 with 4.18.0-383.el8.x86_64. We did not see something similar on dpm for CentOS Linux release 7.9.2009 (Core) with 3.10.0-1160.45.1.el7.x86_64
see at 3.10. Migrating from ext4 to XFS Red Hat Enterprise Linux 7 | Red Hat Customer Portal for CENTOS 7
(e.g. on DPM xfs partitions)
Futher comment
a) I do not think that those parameters could help
XFS FAQ - xfs.org in order to align the xfs filesystem with the underlying raid volume ( in some preliminary test did see any difference in sizes) but we could remake some tests.
b) With the usage of static allocation and switching off the dynamic speculative preallocation fixed allocation size with the ‘allocsize=’ mount option ( e.g. 256kb), we do not have the problem with the size but some files appear to be highly fragmented ( up to 40 extertns)
c) with command /usr/sbin/xfs_db -r -c “frag” /dev/sdxx
we get the level of fragmentation for a partition
the command can run with the partition mounted
the -r ensures read-only operation
d) we can run the defragment process on a device,directoty or file
(e.g. xfs_fsr /dev/sdxx ) the good news is this can be with the partition mounted the bad news is for a partition of 25TB and 100K took 12 hours
as I wrote these lines, I released that we can run xfs_fsr on some partition if the total size and apparent size is greater than one threshold to recover back the lost space. As Our partition does not suffer really from fragmentation 2-3 externs per file on average appeared to be normal. Just we need a way to reclaim the inodes without deleting the file.
We have to find the eliminate the root cause related with vfs cache
otherwise the issue will be returned after the defrag campaign ( special when will and new nodes).
to use static allocation size might be an option but could cause serious performance issues due to the large fragmentation of the files.
please any comment on the above analysis will be very helpfull ?
best
e.v.