Bug 76 - Large amount of TCP retransmission received at the tap side
Summary: Large amount of TCP retransmission received at the tap side
Status: CONFIRMED
Alias: None
Product: passt
Classification: Unclassified
Component: TCP (show other bugs)
Version: unspecified
Hardware: All Linux
: Normal quite bad
Assignee: nobody
URL:
Depends on:
Blocks:
 
Reported: 2023-09-20 15:30 UTC by Nelson Palmer
Modified: 2024-01-17 03:12 UTC (History)
2 users (show)

See Also:


Attachments

Description Nelson Palmer 2023-09-20 15:30:09 UTC
Configured a Windows VM to use passt for network; inside the VM virtio-net is used. When I'm trying to either send or receive data by TCP, when TCP traffic speed exceeds a certain level for a few seconds, suddenly I receive ~5Gbps of TCP retransmission packets in the VM, straining the whole VM's network traffic for some time. So if there is a large file download task in progress, I'd receive a few seconds of ~200Mbps normal traffic, a few seconds of ~5Gbps error packets, and repeat. When the VM receives the error packets, I can see nearly no packets sending/receiving on the host NIC (a few Kbps ~ Mbps), but passt uses up the whole CPU core. 

The case is easily reproducable: iperf3 w/ default args can trigger this issue 99% of the time.

libvirt NIC config:
    <interface type='user'>
      <portForward proto='tcp'>
        <range start='5901' to='5900'/>
      </portForward>
      <portForward proto='udp'>
        <range start='10000' to='10000'/>
      </portForward>
      <model type='virtio'/>
      <driver name='vhost' txmode='iothread' iommu='on' ats='on' packed='on'/>
      <backend type='passt'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </interface>

Environment:
- passt: compiled with AUR passt-git packaging script (https://aur.archlinux.org/packages/passt-git) on the master branch
- OS: Arch Linux (latest), Linux kernel 6.5.3.arch1-1, libvirt 1:9.7.0-1, qemu-system-x86_64 8.1.0-2
- Hardware: dual-socket Intel Xeon Platinum (Cascade Lake-SP) server, BCM58802 NIC with Linux open source driver

Things I've tried:
- Using the avx2 variant or the normal one, does not help (although the non-avx2 version generates these strange retransmission packages a lot slower)
- set MTU on passt (`-m 32000` or `-m 1500`) and in the VM's NIC configuration (set to 1500; default is 1514)
- enabling the log (without trace): nothing important printed out
- downloading / uploading large files: either direction triggers the issue

Misc:
- I haven't tried if UDP traffic can or cannot trigger this issue
- about the ~5Gbps limit: it seems passt can forward at most 5Gbps with a single core on my CPU. Normat traffic tops at this value too.
- I have a TightVNC server running in the Windows VM and port forwarded to the host. Most VNC clients cannot connect to this server (including telnet, as there is no RFB handshake sent by the server when TCP connection establishes); but gvncviewer can. This does not happen when I use the kernel networking stack and nftables to forward the port. VM configuration has not been touched. This may or may not be related to the issue.
Comment 1 Stefano Brivio 2023-09-20 15:41:13 UTC
Hi Nelson,

I'm really not sure if this is related at the moment, but I have a vague feeling that you might be hitting the same issue as https://bugs.passt.top/show_bug.cgi?id=74, for which we're still working on a solution. It seems to be very visible with recent kernels.

Test (and workaround):

  sysctl -w net.core.rmem_max=$((16 * 1024 * 1024))
  sysctl -w net.core.wmem_max=$((16 * 1024 * 1024))

if this helps (on the host, as root), let us know. And also if it doesn't. :)
Comment 2 Nelson Palmer 2023-09-20 16:06:50 UTC
Hi Stefano,

I've seen the bug you mentioned, and tried that value, and it does not help.

Plus I have a sysctl config I kind of copy-paste around all the time as my server defaults, which already increased the memory limits. The full config for your reference:
net.core.netdev_max_backlog = 262144
net.ipv4.tcp_max_syn_backlog = 8192
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.ipv4.udp_rmem_min = 16384
net.ipv4.udp_wmem_min = 16384
net.ipv4.ip_no_pmtu_disc = 0
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_base_mss = 1024
net.ipv4.icmp_errors_use_inbound_ifaddr = 1
net.ipv4.icmp_ratelimit = 0
net.ipv6.icmp.ratelimit = 0
net.core.somaxconn = 8192
net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_max_tw_buckets = 4096
net.core.default_qdisc = fq_codel
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_notsent_lowat = 16384
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_rfc1337 = 1
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_ecn = 1
net.ipv4.tcp_ecn_fallback = 1
net.ipv4.fib_multipath_hash_policy = 1
net.ipv4.fib_multipath_use_neigh = 1
net.ipv4.igmp_max_memberships = 512
net.ipv6.route.max_size = 2147483647
net.ipv4.ip_local_port_range = 1024 65001
Comment 3 Nelson Palmer 2023-09-21 02:04:29 UTC
BTW I have `-march=x86-64-v3` set in makepkg.conf; other flags are Arch Linux defaults. Not sure if it is related...
Comment 4 Stefano Brivio 2023-10-10 12:00:54 UTC
Nelson, while fixing https://bugs.passt.top/show_bug.cgi?id=74 we solved another fundamental issue (https://passt.top/passt/commit/?id=a469fc393fa1dfadc7c51c2729550597ee171a8e) that might play a role here.

Could you please retest with the current HEAD, or with the new version (2023_10_04.f851084-1) of the Arch Linux package (https://archlinux.org/packages/extra/x86_64/passt/)? Thanks.
Comment 5 David Gibson 2024-01-17 03:12:38 UTC
Nelson, we also realized that our initial fix for bug 74 didn't work - or at least only worked in very limited circumstances.  We've since located and merged a much more robust fix, so it would be great to retest.

Note You need to log in before you can comment on or make changes to this bug.