Xfs kernel dump

jwhite · April 8, 2025, 8:30pm

During the EOS workshop this year, an issue was mentioned involving XFS kernel crashes. I mentioned that we had been experiencing them quite frequently but that draining had resolved the issue. Shortly after the workshop we decided to cease the drain and flipped the FSIDs back to ‘rw’ mode. In the last couple of days we have begun experiencing the issue anew. I wonder if we could get some help finding the underlying file in question. While it’s clear what fsids (multiple) are at issue, it would be great to be able to figure out the files/dirs that’re triggering this. The crash dump has a daddrs but I’m not quite sure how to translate that into anything useful. Any help would be appreciated:

Apr  6 07:49:27 alicefst01.lbl.gov kernel: sd 10:0:18:0: [sdu] tag#1705 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s
Apr  6 07:49:27 alicefst01.lbl.gov kernel: sd 10:0:18:0: [sdu] tag#1705 Sense Key : Medium Error [current] [descriptor] 
Apr  6 07:49:27 alicefst01.lbl.gov kernel: sd 10:0:18:0: [sdu] tag#1705 Add. Sense: Unrecovered read error
Apr  6 07:49:27 alicefst01.lbl.gov kernel: sd 10:0:18:0: [sdu] tag#1705 CDB: Read(16) 88 00 00 00 00 04 80 31 df a8 00 00 00 08 00 00
Apr  6 07:49:27 alicefst01.lbl.gov kernel: blk_update_request: critical medium error, dev sdu, sector 19330621352 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
Apr  6 07:49:27 alicefst01.lbl.gov kernel: XFS (sdu): metadata I/O error in "xfs_da_read_buf+0xd3/0x120 [xfs]" at daddr 0x48031dfa8 len 8 error 61

[…]

aNeutrino · April 9, 2025, 9:37am

HI John
I could not find the way to write direct message.
I am working on eos / Argeos · GitLab
plugin which will do exactly this what you need.
Can we have short screen/terminal sharing session when we would try to reverse back together information you need:
path and name of the affected file (if it is a file, could be directory, etc)
And after we do it we will share here the solution

pm@leil.io

jwhite · April 9, 2025, 7:06pm

I’m more looking for the specific xfsdb command the eos folks were using in their talk. I already have the diagnostic data needed, we just need to translate it to a file path.

CERN Accelerating science

Xfs kernel dump