If passt is configured to forward all the host ports to the guest (-t all -u all, typical KubeVirt configuration) it will open ~2^18 sockets if both IPv4 and IPv6 are available, and ~2^17 otherwise. While passt itself doesn't dynamically allocate memory, and uses approximately 25 MiB for buffers plus connection tracking, all those open sockets cause the kernel to allocate quite some memory. Evaluate how much that memory really is, possibly adding a test with an isolated VM image created by mbuto, where the init script itself starts and stop passt with different options and checks free memory before and after. From some quick tests, I observed something in excess of 200 MiB on a 5.15 kernel on x86_64, for all UDP and TCP ports forwarded for both IPv4 and IPv6. Evaluate how we can decrease the memory overhead. Two ideas: - TCP minisocks (listening sockets) used to be really mini, about 700 bytes each, in 2008: https://lwn.net/Articles/277146/. Check if they still are (probably not), and if not, fix that in the kernel - with both IPv4 and IPv6 available, we could use a single listening socket for each port, to forward both IPv4 and IPv6. This works on Linux, not on BSDs, so we should keep the code path "configurable" in some sense, as we don't want to make BSDs ports more difficult than what they already are
(In reply to Stefano Brivio from comment #0) > - TCP minisocks (listening sockets) used to be really mini, about 700 bytes > each, in 2008: https://lwn.net/Articles/277146/. Check if they still are > (probably not), and if not, fix that in the kernel Sorry, they used to be 96 bytes. What I'm estimating now is ~700 bytes each.
Test cases to report memory usage in userspace and kernel: https://archives.passt.top/passt-dev/20221031112059.170269-1-sbrivio@redhat.com/ ...those run passt with different port forwarding options in a rather isolated environment, reporting a summary, plus details about kernel slab objects and sizes of passt's own symbols. As expected, non-slab (userspace) memory usage is essentially constant. We have some numbers from /proc/slabinfo that are rather unexpected to me. Note that this doesn't necessarily represent real-world usage absolute numbers. This is using the kernel from the test machine I'm using for general testing, with a very limited set of modules and the usual hardcoded qemu options (passt runs in a VM here). I don't see these crazy numbers on my laptop. On the other hand, it should be quite useful as a starting point, or for relative, quantitative evaluation: > Memory usage: all TCP and UDP ports forwarded, IPv4 and IPv6 Memory usage summary type: | MiB | used |737.0 | userspace | 41.5 | kernel |692.3 | '------' Userspace memory detail symbol: | MiB | tcp_buf_discard | 16.8 | tcp6_l2_buf | 8.4 | tcp4_l2_buf | 8.4 | tc | 8.4 | pkt_buf | 8.4 | udp_splice_map | 5.2 | udp6_l2_buf | 2.1 | udp4_l2_buf | 2.1 | udp_tap_map | 2.1 | icmp_id_map | 2.1 | udp_splice_buf | 2.1 | tc_hash | 1.5 | pool_tap6_storage | 1.3 | pool_tap4_storage | 1.3 | tap6_l4 | 1.1 | tap4_l4 | 1.1 | '------' Kernel memory detail objects: | count | size | MiB | pid |240640 | 128 | 30.8 | dentry |192570 | 192 | 37.0 | Acpi-Parse |206079 | 56 | 11.5 | kmalloc-64 |196608 | 64 | 12.6 | kmalloc-32 |192512 | 32 | 6.2 | lsm_file_cache |192512 | 16 | 3.1 | filp |205248 | 256 | 52.5 | anon_vma_chain |192512 | 64 | 12.3 | ep_head |192768 | 16 | 3.1 | sock_inode_cache |205181 | 832 | 170.7 | signal_cache | 48132 | 1152 | 55.4 | TCPv6 | 48165 | 2432 | 117.1 | TCP | 48174 | 2240 | 107.9 | UDPv6 | 48144 | 1344 | 64.7 | '-------'-------'-------'
Also probably interesting: > Memory usage: no ports forwarded Memory usage summary type: | MiB | used | 41.1 | userspace | 39.3 | kernel | 0.1 | '------'
Update of progress so far: - David is working on having a shared IPv4/IPv6 listening sockets, which can be done at least on Linux: https://archives.passt.top/passt-dev/20221104084333.3761760-1-david@gibson.dropbear.id.au/ and should significantly cut down on memory usage, probably to something around 350-400 MiB in case all TCP and UDP ports are forwarded for IPv4 and IPv6 - we're exploring in parallel possible improvements in the Linux kernel, following one of these ideas: - implementing some minimised data structure to represent listening sockets. My note in the description of this ticket about minisock is clearly wrong, David pointed out: minisocks are for half-open incoming connection requests. But we could actually try to implement something like that for listening sockets too - implementing the possibility for a single socket to bind to multiple ports (wildcard sockets or something like that)
Initial results from unified IPv4/IPv6 sockets for TCP: https://archives.passt.top/passt-dev/20221116044212.3876516-1-david@gibson.dropbear.id.au/ on David's system, that decreases total memory usage from 677 MiB to 487 MiB. I think we can expect similar gains from the same change done for UDP sockets.
Decreasing priority as this is significantly improved now -- dual sockets still need to be implemented for UDP, and we're still experimenting with kernel-based solutions.