Created attachment 22 [details] log produced by passt I am using iperf3 inside qemu connected through passt, TCP networking can sometimes slow down to 0 bytes/sec. This slowdown is specific to the connection and seems to be related to TCP window scaling. When the connection slows down to 0bytes/sec it never recovers again, but when you create another connection that one may works fine. # Info: I am using the latest passt from master branch (commit a7e4bfb857cb). The host and guest are both Fedora Workstation 38. I am using qemu from Fedora repository. Note, that I ran into this issue while adding support for virtio-net using passt into libkrun (https://www.github.com/containers/libkrun), so the issue should not be specific to qemu, but just in case, the qemu version is qemu-7.2.4-2.fc38. # Steps to reproduce: 1. run passt as: `./passt -f --stderr --trace -4 -t 5201 -u 5201` 2. run qemu as: `qemu-kvm -cdrom ~/Downloads/Fedora-Workstation-Live-x86_64-38-1.6.iso -smp 4 -m 4096 -device virtio-net-pci,netdev=s -netdev stream,id=s,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket` (get the iso here: https://download.fedoraproject.org/pub/fedora/linux/releases/38/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-38-1.6.iso) 3. install iperf3 inside the guest: `sudo dnf install iperf3` Note that the download sometimes slows down to 0B/s because of this bug - you can try multiple times until it succeeds or use the workaround. 4. Run `iperf3 -s` on the guest 5. Run `iperf3 -c 127.0.0.1` on the host If the speed reports around 150-170Mbit/s for at least 2 seconds, the connection will probably work for a long duration until slowing down. When the connection is working fine you can cancel ^C the test and try again multiple times, until you get a connection that slows down in the first few seconds. When the connection slows down it doesn't recover again. # Workaround: Disabling TCP window scaling inside the guest solves the issue. It also sometimes massively improves TCP performance, but only for one next iperf test, subsequent iperf tests are slow though - around 10.5 Mbits/sec, sometimes dropping to 0bytes/sec, but recovering and not staying at 0. To do that, you can use: `sudo sysctl -w net.ipv4.tcp_window_scaling=0` # Description of the attached passt log: All tests are with net.ipv4.tcp_window_scaling enabled. passt log timestamps before 161.8822 are installing iperf3 using dnf timestamps before 279.7843 is an iperf test: $ iperf3 -t 1000 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 60000 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 16.2 MBytes 136 Mbits/sec 4 1.75 MBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 10.00-11.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes [ 5] 11.00-12.00 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes ^C[ 5] 12.00-12.81 sec 0.00 Bytes 0.00 bits/sec 0 1.75 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-12.81 sec 16.2 MBytes 10.6 Mbits/sec 4 sender [ 5] 0.00-12.81 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated timestamps before 332.2857 is an iperf test: $ iperf3 -t 1000 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 48098 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 2.50 MBytes 20.9 Mbits/sec 0 320 KBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes ^C[ 5] 7.00-7.49 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-7.49 sec 2.50 MBytes 2.80 Mbits/sec 0 sender [ 5] 0.00-7.49 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated timestamps before 420.9430 is an iperf test: $ iperf3 -t 1000 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 56828 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 27.5 MBytes 231 Mbits/sec 0 2.06 MBytes [ 5] 1.00-2.00 sec 8.75 MBytes 73.4 Mbits/sec 3 2.06 MBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 2.06 MBytes [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 2.06 MBytes [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 2.06 MBytes [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 2.06 MBytes [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 2.06 MBytes [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0 2.06 MBytes ^C- - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-8.06 sec 36.2 MBytes 37.7 Mbits/sec 3 sender [ 5] 0.00-8.06 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated
Hi Matej, (In reply to Matej Hrica from comment #0) > # Info: > I am using the latest passt from master branch (commit a7e4bfb857cb). The > host and guest are both Fedora Workstation 38. I am using qemu from Fedora > repository. Note, that I ran into this issue while adding support for > virtio-net using passt into libkrun > (https://www.github.com/containers/libkrun), so the issue should not be > specific to qemu, but just in case, the qemu version is qemu-7.2.4-2.fc38. No known issues with that version -- there was an issue which would be sort of compatible with what you're describing, but it was fixed here: $ git describe 7550a82259fcf9ce5f1f6443ced779d0eb8afdca v7.1.0-1255-g7550a82259 and also the guest kernel should be recent enough, last fix I'm aware of for an issue of that sort: $ git describe d71ebe8114b4bf622804b810f5e274069060a174 v6.2-rc3-223-gd71ebe8114b4 > # Steps to reproduce: > 1. run passt as: `./passt -f --stderr --trace -4 -t 5201 -u 5201` I guess you tried already, but... does this happen also without --trace? That's very verbose so a very low throughput with occasional interruptions is actually expected. Could you try to take a capture (instead) with --pcap? That should show a bit more descriptively what happens just before the connection stalls. > # Workaround: > Disabling TCP window scaling inside the guest solves the issue. Well, then we'll get a 64k window and a "safely" low throughput. It looks like we are hitting again some issue with bursts... I haven't tried to reproduce yet, even though I haven't observed anything similar on Fedora 38, but I had different parameters for QEMU. Thanks for providing the steps! I plan to have a look.
Created attachment 23 [details] passt1.pcap on host: $ ./passt -f -4 -t 5201 -u 5201 in guest: $ iperf3 -s on host: $ iperf3 -t 1000 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 53402 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 2.50 MBytes 20.9 Mbits/sec 0 320 KBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes ^C[ 5] 7.00-7.39 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-7.39 sec 2.50 MBytes 2.84 Mbits/sec 0 sender [ 5] 0.00-7.39 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated on host: $ iperf3 -t 1000 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 43318 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 2.50 MBytes 21.0 Mbits/sec 0 320 KBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes ^C[ 5] 8.00-8.78 sec 0.00 Bytes 0.00 bits/sec 0 320 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-8.78 sec 2.50 MBytes 2.39 Mbits/sec 0 sender [ 5] 0.00-8.78 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated
Yes it does happen without `--trace`, possibly less frequently (not sure), still happens on most iperf tests. Giving less cores to the guest, may slightly lower the chance of this happening, but it still happens even with `-smp 1` I am adding pcaps. I ran it on a freshly installed Fedora 38 workstation guest, with just iperf3 installed, with "metered connection" enabled in the GUI network settings to minimize unrelated traffic. guest kernel version: $ uname -a Linux hostname 6.2.9-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Mar 30 22:32:58 UTC 2023 x86_64 GNU/Linux (this is Fedora 38 workstation freshly installed from the iso without updates, but I tried updating it and I still have the same issue) host kernel version: $ uname -a Linux m-rh-lap 6.4.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 23 17:46:49 UTC 2023 x86_64 GNU/Linux
Another pcap, but this one too big to upload here. https://drive.google.com/file/d/1CuZrsTEn2uJSXWCWpZYsMeik_XEc3wV5/view?usp=drive_link Once again iperf3 -s is running in the guest, on the host i am doing the following tests: $ iperf3 -t 1000 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 33028 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 21.2 MBytes 178 Mbits/sec 0 1.81 MBytes [ 5] 1.00-2.00 sec 20.0 MBytes 168 Mbits/sec 0 1.81 MBytes [ 5] 2.00-3.00 sec 22.5 MBytes 189 Mbits/sec 0 1.81 MBytes [ 5] 3.00-4.00 sec 15.0 MBytes 126 Mbits/sec 0 1.81 MBytes [ 5] 4.00-5.00 sec 21.2 MBytes 178 Mbits/sec 0 1.81 MBytes [ 5] 5.00-6.00 sec 21.2 MBytes 178 Mbits/sec 0 1.81 MBytes [ 5] 6.00-7.00 sec 18.8 MBytes 157 Mbits/sec 0 1.81 MBytes [ 5] 7.00-8.00 sec 17.5 MBytes 147 Mbits/sec 0 1.81 MBytes [ 5] 8.00-9.00 sec 20.0 MBytes 168 Mbits/sec 0 1.81 MBytes ^C[ 5] 9.00-9.24 sec 5.00 MBytes 178 Mbits/sec 0 1.81 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-9.24 sec 182 MBytes 166 Mbits/sec 0 sender [ 5] 0.00-9.24 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated $ iperf3 -t 1000 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 37022 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 21.2 MBytes 178 Mbits/sec 0 1.75 MBytes [ 5] 1.00-2.00 sec 11.2 MBytes 94.3 Mbits/sec 3 1.81 MBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 10.00-11.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 11.00-12.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 12.00-13.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 13.00-14.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 14.00-15.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 15.00-16.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 16.00-17.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 17.00-18.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 18.00-19.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 19.00-20.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 20.00-21.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 21.00-22.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 22.00-23.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 23.00-24.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 24.00-25.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 25.00-26.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 26.00-27.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 27.00-28.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 28.00-29.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 29.00-30.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes [ 5] 30.00-31.00 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes ^C[ 5] 31.00-31.78 sec 0.00 Bytes 0.00 bits/sec 0 1.81 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-31.78 sec 32.5 MBytes 8.58 Mbits/sec 3 sender [ 5] 0.00-31.78 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated
There's a lot going on in those packet captures, and I certainly haven't deciphered what's going on yet, but one thing that stands out to me is that wireshark is showing a bunch of "TCP Previous segment not captured" errors. Basically it seems like passt is jumping forward in the stream, either simply failing to send some frames, or incorrectly advancing the sequence number. Or it could be that it's sending them, but omitting them from the pcap file, but I think that's less likely. Any of those options definitely seems like a bug, though I don't yet have much idea why it would do that, or what would trigger it.
I've now reproduced this on my system (thanks for the detailed instructions). I've hit it both with the Fedora live image you suggest, and also with an mbuto image based on my host configuration (also Fedora 38). Why this is showing it, but the very similar iperf3 tests in our standard testsuite isn't showing it, I haven't figured out yet.
Ok, found at least one reason why the testsuite isn't hitting this problem: in order to test for maximum possible throughput the testsuite is designed to work with very large socket buffers, net.core.rmem_max and net.core.wmem_max set to 16MiB. When I do that I can no longer reproduce this bug.
More precisely, I've been able to reproduce this bug very easily with wmem and rmem of 1MiB and below, and haven't managed to reproduce it at all with rmem/wmem 2MiB and up.
Matej, while this is under investigation, you can also give passt a bit more memory: sysctl -w net.core.rmem_max=$((16 * 1024 * 1024)) sysctl -w net.core.wmem_max=$((16 * 1024 * 1024)) ...especially on the host. Even 4 or 2 megs should "fix" your issue. Fedora uses a 208 KiB default for both.
Ok, I've done a bunch of investigation and have a number of observations, although no final conclusion: * If I force the low_rmem and low_wmem values inside passt to 0, I can't reproduce the problem. This strongly suggests the problem is with our handling of small socket buffers rather than the fact that the socket buffers are small(ish) per se. * Forcing just low_rmem=0 or just low_wmem to 0 is insufficient, it has to be both. * More specifically, I can't reproduce if I remove exactly one effect each of low_rmem and low_wmem. I can't reproduce if I remove just: - the test on !low_rmem when setting SO_RCVBUF in tcp_sock_set_bufsize() AND - the test on !low_wmem when setting SO_SNDBUF in tap_listen_handler() * In the original case the logs (both from Matej and ones I reproduced myself) there are a bunch of messages like: 236.0487: tap: dropped 35 frames of 39 due to short send These indicate that we're dropping frames over the link to qemu. We don't presently handle that very elegantly: we simply let the frames drop and assume that TCP will figure it out. The fact we're dropping frames here explains the "packet not captured" errors in the pcap files. I initially thought that the dropping of frames this was a crucial part of the puzzle, but.. * If I remove the !low_rmem test on the TCP SO_RCVBUF, but *not* the !low_wmem test on the tap (Unix) SO_SNDBUF then I can still reproduce the problem (or one very like it), but I *don't* get the dropped frames. So the stalling mechanism doesn't require frames to be dropped. Still looking...
Forgot some additional observations: * I can't reproduce if I limit the window used by iperf3 to ~256kiB with the -w option * I *can* still reproduce if I clamp the window in tcp_clamp_window to 256kiB or 128kiB.
I drilled down into what TCP SO_RCVBUF sizes seem to trigger the problem. It seems to kick in when the RCVBUF size drops below 256kiB, or pretty close to that. * With SO_RCVBUF comfortably above 256kiB I get consistently high throughput ~2-3Gbytes/s * When SO_RCVBUF gets close to 256kiB it sometimes works well, sometimes starts slow (MiB/s rather than GiB/s) then speeds up to 2-3GiB/s, sometimes stays slow, sometimes stalls commpletely * As SO_RCVBUF drops below 256kiB chances of a good run seem to drop very rapidly. [All values quoted above as reported by *getsockopt*, which means setting half the value with setsockopt, because... reasons]. At least on my distro/configuration not setting SO_RCVBUF at all (as we do with low_rmem == 1) results in SO_RCVBUF==~128K, well into the problem zone.
Yet more experiments and observations: * I took a packet trace of a stall from the *host* side. There the window advertised from the server (that is, from passt acting for the server) abruptly drops from ~64kiB to 112 bytes, then stays there. * If I use the -m option to clamp MTU to 64000 bytes, I can no longer reproduce the problem. I really don't know how this fits in with everything else.
Unsurprisingly given then MTU fact from comment 13 - but pretty surprising in isolation, clamping the (tap side) MSS for connections also seems to stop the problem from reproducing. How much it needs to be clamped seems to be a bit fuzzy. In intermediate areas it seems to increase the incidence of a case where the transfer starts slow(ish) - Mbps rather than Gbps - but after 1-2s recovers to the expected full speed.
Expanding on comment 14: * Clamping MSS to 63960 (equivalent to MTU of 64000), I'm rarely see a full stall or consistently slow transfer, although slow starts are moderately common * With the unclamped MSS of 65436, I get slow transfers on most attempts and full stalls maybe 30-50% of the time. * With MSS clamped to 32000, stalls, slow transfers and slow starts are all rare, although I have still seen at least one stall.
Sorry, I was working on bug 68 for a while, but have now gotten back to investigating this. New observations: * I've confirmed that when "stalled" a trickle of data still seems to be getting through: - While the iperf3 client shows 0 bytes/s most of the time, the iperf3 server shows a few hundred bytes being transferred - If left running long enough, every so often the client shows a few hundred k transferred in an interval, then returns to 0 transfer * ss shows a substantial amount of data in the Send-Q of the iperf3 client's sending socket, which seems to reduce slowly over time * It looks like some buffer is filling up (though I'm not sure exactly which buffer) between iperf3 -c and passt, then draining very slowly until it has enough room to accept another chunk of data (this suggests some kind if hysteresis in the management of this buffer, wherever it is) * On a host side packet trace of a long, eventually stalling transfer I noticed these patterns, not sure if they're significant: - Initially window full events seem to be quite rare and irregular - Some time in (but before the full stall), window full events become regular around every 0.25s / 44 frames, then at the time of the stall window full events become super frequent, around every 0.2s / 2 frames. - All these "regular" window full events seem to come a significant delay ~0.2s after the previous packet. Not sure what the cause of this delay is.
Matej, an updated package for Fedora 38 (https://bodhi.fedoraproject.org/updates/FEDORA-2023-b1e79e591e) is available. Can you check if that fixes the issue for you, before we close this? By the way, the series (now merged) that should fix this is: https://archives.passt.top/passt-dev/20230929150446.2671959-1-sbrivio@redhat.com/
It definetly works, though the throghput jumps around quite a bit, is this expected behavior? $ iperf3 -t 1000 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 34144 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 328 MBytes 2.75 Gbits/sec 0 1.62 MBytes [ 5] 1.00-2.00 sec 82.5 MBytes 692 Mbits/sec 0 2.00 MBytes [ 5] 2.00-3.00 sec 289 MBytes 2.42 Gbits/sec 0 2.31 MBytes [ 5] 3.00-4.00 sec 206 MBytes 1.73 Gbits/sec 1 2.44 MBytes [ 5] 4.00-5.00 sec 440 MBytes 3.69 Gbits/sec 1 2.44 MBytes [ 5] 5.00-6.00 sec 275 MBytes 2.31 Gbits/sec 0 2.44 MBytes [ 5] 6.00-7.00 sec 525 MBytes 4.40 Gbits/sec 0 2.44 MBytes [ 5] 7.00-8.00 sec 302 MBytes 2.54 Gbits/sec 0 2.44 MBytes [ 5] 8.00-9.00 sec 180 MBytes 1.51 Gbits/sec 0 2.44 MBytes [ 5] 9.00-10.00 sec 102 MBytes 860 Mbits/sec 0 2.44 MBytes [ 5] 10.00-11.00 sec 52.5 MBytes 440 Mbits/sec 0 2.44 MBytes [ 5] 11.00-12.00 sec 256 MBytes 2.15 Gbits/sec 0 2.44 MBytes [ 5] 12.00-13.00 sec 172 MBytes 1.45 Gbits/sec 0 2.44 MBytes [ 5] 13.00-14.00 sec 170 MBytes 1.43 Gbits/sec 0 2.44 MBytes [ 5] 14.00-15.00 sec 189 MBytes 1.58 Gbits/sec 0 2.44 MBytes [ 5] 15.00-16.00 sec 316 MBytes 2.65 Gbits/sec 0 2.44 MBytes [ 5] 16.00-17.00 sec 114 MBytes 954 Mbits/sec 0 2.44 MBytes Capping the sending speed of iperf improves the throguhput: $ iperf3 -t 1000 -c 127.0.0.1 -b 5.5G Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 38560 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 655 MBytes 5.50 Gbits/sec 0 1.12 MBytes [ 5] 1.00-2.00 sec 656 MBytes 5.50 Gbits/sec 0 1.12 MBytes [ 5] 2.00-3.00 sec 656 MBytes 5.50 Gbits/sec 0 1.50 MBytes [ 5] 3.00-4.00 sec 656 MBytes 5.50 Gbits/sec 0 1.50 MBytes [ 5] 4.00-5.00 sec 656 MBytes 5.50 Gbits/sec 0 1.50 MBytes [ 5] 5.00-6.00 sec 390 MBytes 3.27 Gbits/sec 7 1.19 MBytes [ 5] 6.00-7.00 sec 139 MBytes 1.16 Gbits/sec 0 1.50 MBytes [ 5] 7.00-8.00 sec 97.8 MBytes 820 Mbits/sec 9 1.06 MBytes [ 5] 8.00-9.00 sec 505 MBytes 4.24 Gbits/sec 0 1.06 MBytes [ 5] 9.00-10.00 sec 589 MBytes 4.94 Gbits/sec 0 1.06 MBytes [ 5] 10.00-11.00 sec 163 MBytes 1.37 Gbits/sec 0 1.44 MBytes [ 5] 11.00-12.00 sec 564 MBytes 4.73 Gbits/sec 0 1.44 MBytes [ 5] 12.00-13.00 sec 502 MBytes 4.21 Gbits/sec 0 1.56 MBytes These test are with default socket buffer sizes, both guest and host are Fedora 38. $ sysctl net.core.rmem_max net.core.rmem_default net.core.wmem_max net.core.wmem_default net.core.rmem_max = 212992 net.core.rmem_default = 212992 net.core.wmem_max = 212992 net.core.wmem_default = 212992
Never mind, it seems like a QEMU performance issue. With my implementation in libkrun, it works very nicely: $ iperf3 -t 1000 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 52838 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 1.72 GBytes 14.8 Gbits/sec 0 1.25 MBytes [ 5] 1.00-2.00 sec 1.75 GBytes 15.0 Gbits/sec 0 1.25 MBytes [ 5] 2.00-3.00 sec 1.64 GBytes 14.1 Gbits/sec 0 1.25 MBytes [ 5] 3.00-4.00 sec 1.72 GBytes 14.8 Gbits/sec 0 1.37 MBytes [ 5] 4.00-5.00 sec 1.57 GBytes 13.5 Gbits/sec 0 1.37 MBytes [ 5] 5.00-6.00 sec 1.73 GBytes 14.9 Gbits/sec 0 1.37 MBytes [ 5] 6.00-7.00 sec 1.68 GBytes 14.5 Gbits/sec 0 1.37 MBytes [ 5] 7.00-8.00 sec 1.09 GBytes 9.35 Gbits/sec 0 1.37 MBytes [ 5] 8.00-9.00 sec 1.31 GBytes 11.3 Gbits/sec 0 1.37 MBytes [ 5] 9.00-10.00 sec 1.68 GBytes 14.5 Gbits/sec 0 1.37 MBytes [ 5] 10.00-11.00 sec 1.68 GBytes 14.4 Gbits/sec 0 1.37 MBytes [ 5] 11.00-12.00 sec 1.66 GBytes 14.3 Gbits/sec 0 1.37 MBytes [ 5] 12.00-13.00 sec 1.67 GBytes 14.4 Gbits/sec 0 1.37 MBytes [ 5] 13.00-14.00 sec 1.70 GBytes 14.6 Gbits/sec 0 1.37 MBytes [ 5] 14.00-15.00 sec 1.72 GBytes 14.8 Gbits/sec 0 1.37 MBytes
(In reply to Matej Hrica from comment #18) > It definetly works, though the throghput jumps around quite a bit, is this > expected behavior? Yes and no. Although the changes we've made stop the specific mechanism of the stall you were seeing originally, we noticed during testing that there are still some performance oddities related to the buffer sizes. We're still looking at those and thinking what we can do - it's pretty hard to pin down because there are rather a lot of variables which affect what's going on in complex ways. So it's not intended behaviour, but it's kind of expected for now. (In reply to Matej Hrica from comment #19) > Never mind, it seems like a QEMU performance issue. Huh... that's interesting, I never even thought to check that. In any case, it seems like the original bug is resolved, so I'm closing this ticket.
Hi Matej, We've been debugging a bug very similar to this one over at: https://github.com/containers/podman/issues/20170 The conclusion to that seems to indicate that the fix we gave you for this bug wasn't really a fix, but just masked the problem in some cases. We now have a fix that should be better. If you want to try it out, it's here: https://gitlab.com/dgibson/passt/-/tree/noclamp?ref_type=heads