peby
(Pete Eby)
January 17, 2025, 7:29pm
1
Odd behavior which I believe started after adding two new FSTs.
eos fs ls --io shows 90% of all writes are being directed to 4 of 404 online, writable fsids.
Almost all writes are going to two disks in each of two fsts. Nominal (or zero) writes to all others.
Naturally, this is causing those two FST to exhibit sky high, artificially induced, iowait of up to 80% as IO bottlenecks to 2 drives of the 84 on each FST. Other drives on same FST show zero diskw-MB/s - it just hammers two at a time.
FSTs are using single disk fsids. Prior three FSTs are architecturally identically and have historically ingested 20Gbit inbound writes without breaking a sweat, distributing IO across available fsids.
eos 5.2.28, rhel8, kernel 4.18.0-553.27.1.el8_10
Any suggestions?
peby
(Pete Eby)
January 17, 2025, 9:29pm
2
Updated all fsts and mgms to 5.3.0-1 - still seeing the same.
peby
(Pete Eby)
January 17, 2025, 9:33pm
3
Maybe it (writing to only a few fsids at a time) is a product of how the balancer works and I never noticed before? (Seems odd though.)
[root@ornl-eos-01]-diopside-~# eos io stat -x
┏━> Sum of bytes transferred in last 1m/5m/1h/24h and total sum:
┌─────┬────────────────────────┬────────┬────────┬────────┬────────┬────────┐
│io │ application│ 1min│ 5min│ 1h│ 24h│ sum│
└─────┴────────────────────────┴────────┴────────┴────────┴────────┴────────┘
out alimonitor 0 0 48.70 M 48.70 M 48.70 M
out dataAccess 5.10 G 23.80 G 96.24 G 96.24 G 96.24 G
out SEFileCrawler 0 0 885.48 M 885.48 M 885.48 M
out JobWrapper 60.10 K 150.24 K 863.88 K 863.88 K 863.88 K
out eos/balance 27.71 G 203.74 G 863.23 G 863.23 G 863.23 G
out eos/drain 0 0 31.22 M 31.22 M 31.22 M
in SEFileCrawler 34.35 K 114.82 K 1.42 M 1.42 M 1.42 M
in JobWrapper 4.73 G 24.23 G 94.28 G 94.28 G 94.28 G
in eos/balance 26.90 G 203.04 G 861.63 G 861.63 G 861.63 G
in eos/drain 0 0 30.41 M 30.41 M 30.41 M
in transfer-3rd 349.18 K 679.94 K 1.15 M 1.15 M 1.15 M
apeters
(Andreas Joachim Peters)
January 20, 2025, 9:44am
4
Hi Pete,
you should have a look at
eos geosched show param
and
eos geosched show tree
This will answer the question, why only two disks are used.
In case, post it here!
peby
(Pete Eby)
January 21, 2025, 3:01pm
5
Hi Andreas,
Thanks for the help.
[root@ornl-eos-01]-diopside-~# eos geosched show param
### GeoTreeEngine parameters :
skipSaturatedAccess = 0
skipSaturatedDrnAccess = 0
skipSaturatedBlcAccess = 0
proxyCloseToFs = 1
penaltyUpdateRate = 1
plctDlScorePenalty = 1.97574(default) | 10(1Gbps) | 10(10Gbps) | 10(100Gbps) | 10(1000Gbps)
plctUlScorePenalty = 1.97574(defaUlt) | 10(1Gbps) | 10(10Gbps) | 10(100Gbps) | 10(1000Gbps)
accessDlScorePenalty = 1.97574(default) | 10(1Gbps) | 10(10Gbps) | 10(100Gbps) | 10(1000Gbps)
accessUlScorePenalty = 1.97574(defaUlt) | 10(1Gbps) | 10(10Gbps) | 10(100Gbps) | 10(1000Gbps)
fillRatioLimit = 95
fillRatioCompTol = 100
saturationThres = 10
timeFrameDurationMs = 1000
For post length I’ve removed fsids 1100 through 11083 from the output below, but they were same as the others.
[root@ornl-eos-01]-diopside-~# eos geosched show tree
┌─────────┬────────┬─────┬───────────────────┬────────┬─────┬───┬──────────┐
│group │geotag │ fsid│ node│branches│leavs│sum│ status│
└─────────┴────────┴─────┴───────────────────┴────────┴─────┴───┴──────────┘
default.0 2 412 414
└───▶ e204ah75 1 412 413
├──▶ 12000 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12001 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12002 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12003 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12004 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12005 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12006 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12007 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12008 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12009 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12010 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12011 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12012 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12013 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12014 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12015 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12016 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12017 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12018 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12019 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12020 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12021 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12022 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12023 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12024 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12025 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12026 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12027 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12028 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12029 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12030 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12031 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12032 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12033 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12034 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12035 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12036 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12037 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12038 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12039 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12040 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12041 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12042 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12043 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12044 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12045 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12046 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12047 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12048 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12049 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12050 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12051 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12052 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12053 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12054 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12055 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12056 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12057 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12058 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12059 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12060 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12061 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12062 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12063 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12064 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12065 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12066 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12067 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12068 eos-fst-12.ornl.gov 0 1 1 UnvDinRO
├──▶ 12069 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12070 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12071 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12072 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12073 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12074 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12075 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12076 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12077 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12078 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12079 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12080 eos-fst-12.ornl.gov 0 1 1 UnvDinRW
├──▶ 12081 eos-fst-12.ornl.gov 0 1 1 UnvDinnoIO
├──▶ 12082 eos-fst-12.ornl.gov 0 1 1 UnvDinRO
├──▶ 12083 eos-fst-12.ornl.gov 0 1 1 UnvDinnoIO
├──▶ 13000 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13001 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13002 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13003 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13004 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13005 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13006 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13007 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13008 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13009 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13010 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13011 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13012 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13013 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13014 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13015 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13016 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13017 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13018 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13019 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13020 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13021 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13022 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13023 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13024 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13025 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13026 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13027 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13028 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13029 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13030 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13031 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13032 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13033 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13034 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13035 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13036 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13037 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13038 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13039 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13040 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13041 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13042 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13043 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13044 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13045 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13046 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13047 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13048 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13049 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13050 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13051 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13052 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13053 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13054 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13055 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13056 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13057 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13058 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13059 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13060 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13061 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13062 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13063 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13064 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13065 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13066 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13067 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13068 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13069 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13070 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13071 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13072 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13073 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13074 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13075 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13076 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13077 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13078 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13079 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 13080 eos-fst-13.ornl.gov 0 1 1 UnvDinRW
├──▶ 14000 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14001 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14002 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14003 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14004 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14005 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14006 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14007 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14008 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14009 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14010 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14011 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14012 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14013 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14014 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14015 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14016 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14017 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14018 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14019 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14020 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14021 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14022 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14023 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14024 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14025 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14026 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14027 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14028 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14029 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14030 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14031 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14032 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14033 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14034 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14035 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14036 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14037 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14038 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14039 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14040 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14041 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14042 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14043 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14044 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14045 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14046 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14047 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14048 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14049 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14050 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14051 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14052 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14053 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14054 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14055 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14056 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14057 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14058 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14059 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14060 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14061 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14062 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14063 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14064 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14065 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14066 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14067 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14068 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14069 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14070 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14071 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14072 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14073 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14074 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14075 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14076 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14077 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14078 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14079 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14080 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 14081 eos-fst-14.ornl.gov 0 1 1 UnvDinRW
├──▶ 15000 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15001 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15002 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15003 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15004 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15005 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15006 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15007 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15008 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15009 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15010 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15011 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15012 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15013 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15014 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15015 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15016 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15017 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15018 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15019 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15020 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15021 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15022 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15023 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15024 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15025 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15026 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15027 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15028 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15029 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15030 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15031 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15032 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15033 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15034 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15035 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15036 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15037 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15038 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15039 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15040 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15041 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15042 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15043 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15044 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15045 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15046 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15047 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15048 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15049 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15050 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15051 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15052 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15053 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15054 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15055 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15056 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15057 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15058 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15059 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15060 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15061 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15062 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15063 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15064 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15065 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15066 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15067 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15068 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15069 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15070 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15071 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15072 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15073 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15074 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15075 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15076 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15077 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15078 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15079 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
├──▶ 15080 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
└──▶ 15081 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
--------------------------------------------------------------------------
spare 2 7 9
└─────▶ e204ah75 1 7 8
├──▶ 13081 eos-fst-13.ornl.gov 0 1 1 UnvDinnoIO
├──▶ 13082 eos-fst-13.ornl.gov 0 1 1 UnvDinnoIO
├──▶ 13083 eos-fst-13.ornl.gov 0 1 1 UnvDinnoIO
├──▶ 14082 eos-fst-14.ornl.gov 0 1 1 UnvDinnoIO
├──▶ 14083 eos-fst-14.ornl.gov 0 1 1 UnvDinnoIO
├──▶ 15082 eos-fst-15.ornl.gov 0 1 1 UnvDinRW
└──▶ 15083 eos-fst-15.ornl.gov 0 1 1 UnvDinnoIO
peby
(Pete Eby)
January 23, 2025, 1:55pm
6
Hi @apeters is there further information we can provide?
This behavior continues, with writes hammering a very few drives at any given time.
Thank you for the assistance.
peby
(Pete Eby)
January 31, 2025, 8:45pm
7
This issue was resolved with the following information provided by @esindril (thanks Elvin!)
“Indeed, I think this is a side-effect of the scheduling groups having more than the max 255 file systems. There are some small data structures used in the geotree engine and there is some overflow/wrapping of values when this it exceeded.”
In our case we redistributed fsids among five (new) subgroups via
for fsid in {start…end};do eos fs mv --force $fsid default.$((fsid % 5));done
Move into default.{0…4} create if not exist
eos geosched show tree # showed the five scheduling groups and fsid distribtions
eos fs ls --io # showed writes being more evenly distributed across active fsid
After the above writes started being distributed over available drives, iowait plummeted from 80% to <5% and things appear happy.
Re: Replicated files
Elvin: Ideally, all the file systems on which you have replicas for these files should be in the same group, so that draining, balancing and other internal workflows continue to work as expected. If you now spread replicas of the same file to different groups, then the built-in assumptions won’t hold anymore
Pete: Is there anyway to specify which fsids an eos directory will be allowed to use?
Elvin: No, the algorithm picks a group from the space that the directory belongs to and then inside that group it picks 2,3,4 etc. file systems for the replicas.