Lightly Loaded FST Yet Extremely Poor TCP Performance

peby · December 21, 2018, 9:50pm

EOS friends,

The alice::ornl::tmp SE consists of one MGM+FST, currently with 60 x 10TB drives (single disk fsids) attached.

Data is (slowly) being populated, currently this FST serves ~130 TB

Load on this SE is vastly low, with averages like:

~300 Xrootd tcp established connections
System 15 min LA ~4
<2% IO Wait
Outbound/Inbound TCP is MB/sec magnitudes (or less)

However, despite the near idle eos environment from a performance and resource usage standpoint what is observed is:

Immediately after EOS services complete startup iperf3 results to/from the node plummet from 10Gbit/sec to < half that, with extreme variation, hundreds of tcp retries, the OS interface dropped counter incrementing and the switchport showing discards.

Essentially, starting eos (while under no appreciable demand) completely kills network performance to the node. There is no indications of networking, memory, CPU, or other resource issue - all show an essentially idle environment.

This issue affects the speed at which 3rd party transfer move data, which consistently top out out 20MB/sec, though rsync achieves much faster speeds (when eos is stopped on ornl::tmp and networking returns to normal.)

The ALICE::ORNL::EOS (separate SE) does not see the same issues. The same version of eos (4.3.12-1) is on both SEs.

The same 10G interfaces (Intel X520-2), same kernel versions, same OS, and same in-kernel ixgbe drive are used by all FSTs in both SEs, yet this issue is only present in the alice::ornl::temp SE and FST.

The same recommended 10G sysctl tunings are in place on all nodes (and we’ve adjusted them with no impact on this issue.)

Another issue on this node, which may relate: FST Service Silently Fails

Any suggestions to debug further would be appreciated. We intend to deploy additional nodes in this same configuration, but this is a bit of a showstopper both in migrating data and the stability of the new nodes.

Cheers,
Pete

peby · December 21, 2018, 9:58pm

Environment info:

root@ornl-eos-xfer.ornl.gov:~
21:46:56 # eos group ls --io
┌────────────────┬──────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────┬──────┬────────────┬────────────┬────────────┬───────────┬──────────┬──────────┐
│name │ diskload│ diskr-MB/s│ diskw-MB/s│ eth-MiB/s│ ethi-MiB│ etho-MiB│ ropen│ wopen│ used-bytes│ max-bytes│ used-files│ max-files│ bal-shd│ drain-shd│
└────────────────┴──────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────┴──────┴────────────┴────────────┴────────────┴───────────┴──────────┴──────────┘
default.0 0.07 153 920 71520 0 0 3 5 136.31 TB 599.92 TB 3.63 M 117.20 G 0 0

IP Traffic Rates:

Incoming rates: 217806.2 kbits/sec │
│ 2739.8 packets/sec │
│ IP checksum errors: 0 │
│ Outgoing rates: 1258.5 kbits/sec │
│ 2348.0 packets/sec │
│

root@ornl-eos-xfer.ornl.gov:~
21:49:14 # eos io stat
┌───┬────────────────────────┬────────┬────────┬────────┬────────┬────────┐
│who│ io value│ sum│ 1min│ 5min│ 1h│ 24h│
└───┴────────────────────────┴────────┴────────┴────────┴────────┴────────┘
all bwd_seeks 7.47 M 0 11.68 K 47.37 K 47.44 K
all bytes_bwd_wseek 72.80 T 0 37.05 G 219.03 G 219.03 G
all bytes_deleted 28.67 G 0 0 1.22 G 1.22 G
all bytes_fwd_seek 83.58 T 0 44.70 G 287.47 G 287.47 G
all bytes_read 4.59 T 0 5.72 G 22.12 G 22.12 G
all bytes_written 6.95 T 2.31 M 4.87 G 14.75 G 14.75 G
all bytes_xl_bwd_wseek 0 0 0 0 0
all bytes_xl_fwd_seek 0 0 0 0 0
all disk_time_read 14.17 M 0 21.86 K 98.85 K 98.90 K
all disk_time_write 8.34 M 0 5.38 K 16.40 K 16.40 K
all files_deleted 376.67 K 0 0 1.51 K 1.51 K
all fwd_seeks 27.36 M 0 41.96 K 165.22 K 165.27 K
all read_calls 75.88 M 0 135.53 K 542.67 K 542.71 K
all readv_calls 2.66 M 0 2.53 K 10.49 K 10.54 K
all write_calls 6.80 M 3 4.76 K 14.69 K 14.69 K
all xl_bwd_seeks 4.87 M 0 9.52 K 39.04 K 39.09 K
all xl_fwd_seeks 7.53 M 0 13.74 K 55.72 K 55.77 K

Iperf from worker node to problematic FST:

[root@alice-035 ~]# iperf3 -c ornl-eos-xfer.ornl.gov
Connecting to host ornl-eos-xfer.ornl.gov, port 5201
[ 4] local 192.188.182.135 port 46620 connected to 192.188.182.24 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 183 MBytes 1.54 Gbits/sec 3 1.83 MBytes
[ 4] 1.00-2.00 sec 398 MBytes 3.34 Gbits/sec 90 2.73 MBytes
[ 4] 2.00-3.00 sec 871 MBytes 7.31 Gbits/sec 0 2.73 MBytes
[ 4] 3.00-4.00 sec 739 MBytes 6.20 Gbits/sec 2 2.16 MBytes
[ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 1 2.33 MBytes
[ 4] 5.00-6.00 sec 736 MBytes 6.18 Gbits/sec 125 2.54 MBytes
[ 4] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 2 2.63 MBytes
[ 4] 7.00-8.00 sec 191 MBytes 1.61 Gbits/sec 3 2.10 MBytes
[ 4] 8.00-9.00 sec 114 MBytes 954 Mbits/sec 70 1.71 MBytes
[ 4] 9.00-10.00 sec 65.0 MBytes 545 Mbits/sec 3 2.14 MBytes

[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 3.22 GBytes 2.77 Gbits/sec 299 sender
[ 4] 0.00-10.00 sec 3.22 GBytes 2.76 Gbits/sec receiver

Iperf from idle node (no services running) to problematic FST:

root@warp-ornl-cern-05.ornl.gov:~
21:33:31 # iperf3 -c ornl-eos-xfer.ornl.gov
Connecting to host ornl-eos-xfer.ornl.gov, port 5201
[ 4] local 192.188.182.22 port 44284 connected to 192.188.182.24 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.02 GBytes 8.77 Gbits/sec 2 3.14 MBytes
[ 4] 1.00-2.00 sec 1.25 MBytes 10.5 Mbits/sec 2 4.73 MBytes
[ 4] 2.00-3.00 sec 96.2 MBytes 807 Mbits/sec 84 2.31 MBytes
[ 4] 3.00-4.00 sec 134 MBytes 1.12 Gbits/sec 39 1.74 MBytes
[ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 2 2.74 MBytes
[ 4] 5.00-6.00 sec 96.2 MBytes 808 Mbits/sec 82 1.53 MBytes
[ 4] 6.00-7.00 sec 136 MBytes 1.14 Gbits/sec 6 1.43 MBytes
[ 4] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 2 2.07 MBytes
[ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec 16 1.62 MBytes
[ 4] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 1 2.27 MBytes

[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.72 GBytes 1.48 Gbits/sec 236 sender
[ 4] 0.00-10.00 sec 1.72 GBytes 1.47 Gbits/sec receiver

root@ornl-eos-xfer.ornl.gov:~
21:43:03 # netstat -natlp | grep ESTAB | wc -l
341

root@ornl-eos-xfer.ornl.gov:~
21:46:42 # w
21:46:44 up 4 days, 1:41, 2 users, load average: 3.50, 2.83, 2.14
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 192.188.182.18 20:10 43.00s 0.13s 0.10s ssh -Y cern05
root pts/3 192.188.182.18 20:56 0.00s 0.15s 0.03s w

root@ornl-eos-xfer.ornl.gov:~
21:46:44 # sar | tail
08:20:01 PM all 3.80 0.00 4.31 1.64 0.00 90.24
08:30:01 PM all 3.41 0.00 3.38 0.98 0.00 92.23
08:40:01 PM all 2.33 0.00 2.33 0.48 0.00 94.85
08:50:01 PM all 0.11 0.00 0.43 0.02 0.00 99.45
09:00:01 PM all 0.12 0.00 0.34 0.02 0.00 99.53
09:10:01 PM all 0.10 0.00 0.37 0.02 0.00 99.51
09:20:01 PM all 0.60 0.00 0.82 0.19 0.00 98.39
09:30:01 PM all 2.98 0.00 2.62 0.55 0.00 93.85
09:40:01 PM all 2.39 0.00 2.80 0.58 0.00 94.23
Average: all 1.25 0.00 1.73 0.80 0.00 96.22

root@ornl-eos-xfer.ornl.gov:~
17:42:28 # lspci -k -s 08:00.0
08:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Subsystem: Intel Corporation Ethernet Server Adapter X520-2
Kernel driver in use: ixgbe
Kernel modules: ixgbe

root@ornl-eos-xfer.ornl.gov:~
17:42:37 # modinfo ixgbe
filename: /lib/modules/2.6.32-696.30.1.el6.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko
version: 4.2.1-k
…
depends: mdio,ptp,dca
vermagic: 2.6.32-696.30.1.el6.x86_64 SMP mod_unload modversions
parm: IntMode:Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default 2 (array of int)
parm: FdirMode:Flow Director filtering modes (0=Off, 1=On) default 1 (array of int)
parm: max_vfs:Maximum number of virtual functions to allocate per physical function - default is zero and maximum value is 63. (Deprecated) (uint)
parm: allow_unsupported_sfp:Allow unsupported and untested SFP+ modules on 82599-based adapters (uint)
parm: debug:Debug level (0=none,…,16=all) (int)

peby · December 21, 2018, 10:14pm

Sample of nominal tcp connections present when iperf tests show poor results:

CERN Accelerating science

Lightly Loaded FST Yet Extremely Poor TCP Performance