Bug 20

Summary: Evaluate memory overhead with different port forwarding configurations, decrease it
Product: passt Reporter: Stefano Brivio <sbrivio>
Component: generalAssignee: Stefano Brivio <sbrivio>
Status: IN_PROGRESS ---    
Severity: normal CC: alkaplan, dgibson
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   

Description Stefano Brivio 2022-10-27 16:05:34 UTC
If passt is configured to forward all the host ports to the guest (-t all
-u all, typical KubeVirt configuration) it will open ~2^18 sockets if both IPv4
and IPv6 are available, and ~2^17 otherwise.

While passt itself doesn't dynamically allocate memory, and uses approximately
25 MiB for buffers plus connection tracking, all those open sockets cause the
kernel to allocate quite some memory.

Evaluate how much that memory really is, possibly adding a test with an isolated
VM image created by mbuto, where the init script itself starts and stop passt
with different options and checks free memory before and after. From some quick
tests, I observed something in excess of 200 MiB on a 5.15 kernel on x86_64, for
all UDP and TCP ports forwarded for both IPv4 and IPv6.

Evaluate how we can decrease the memory overhead. Two ideas:

- TCP minisocks (listening sockets) used to be really mini, about 700 bytes
  each, in 2008: https://lwn.net/Articles/277146/. Check if they still are
  (probably not), and if not, fix that in the kernel

- with both IPv4 and IPv6 available, we could use a single listening socket for
  each port, to forward both IPv4 and IPv6. This works on Linux, not on BSDs, so
  we should keep the code path "configurable" in some sense, as we don't want to
  make BSDs ports more difficult than what they already are
Comment 1 Stefano Brivio 2022-10-27 16:11:25 UTC
(In reply to Stefano Brivio from comment #0)
> - TCP minisocks (listening sockets) used to be really mini, about 700 bytes
>   each, in 2008: https://lwn.net/Articles/277146/. Check if they still are
>   (probably not), and if not, fix that in the kernel

Sorry, they used to be 96 bytes. What I'm estimating now is ~700 bytes each.
Comment 2 Stefano Brivio 2022-10-31 11:27:54 UTC
Test cases to report memory usage in userspace and kernel:

    https://archives.passt.top/passt-dev/20221031112059.170269-1-sbrivio@redhat.com/

...those run passt with different port forwarding options in a rather isolated environment, reporting a summary, plus details about kernel slab objects and sizes of passt's own symbols.

As expected, non-slab (userspace) memory usage is essentially constant. We have some numbers from /proc/slabinfo that are rather unexpected to me.

Note that this doesn't necessarily represent real-world usage absolute numbers. This is using the kernel from the test machine I'm using for general testing, with a very limited set of modules and the usual hardcoded qemu options (passt runs in a VM here). I don't see these crazy numbers on my laptop. On the other hand, it should be quite useful as a starting point, or for relative, quantitative evaluation:

> Memory usage: all TCP and UDP ports forwarded, IPv4 and IPv6
Memory usage summary

                                                                     type: |  MiB |
                                    used                                   |737.0 |
                                 userspace                                 | 41.5 |
                                   kernel                                  |692.3 |
                                                                           '------'
Userspace memory detail

                                                                   symbol: |  MiB |
                              tcp_buf_discard                              | 16.8 |
                                tcp6_l2_buf                                |  8.4 |
                                tcp4_l2_buf                                |  8.4 |
                                     tc                                    |  8.4 |
                                  pkt_buf                                  |  8.4 |
                               udp_splice_map                              |  5.2 |
                                udp6_l2_buf                                |  2.1 |
                                udp4_l2_buf                                |  2.1 |
                                udp_tap_map                                |  2.1 |
                                icmp_id_map                                |  2.1 |
                               udp_splice_buf                              |  2.1 |
                                  tc_hash                                  |  1.5 |
                             pool_tap6_storage                             |  1.3 |
                             pool_tap4_storage                             |  1.3 |
                                  tap6_l4                                  |  1.1 |
                                  tap4_l4                                  |  1.1 |
                                                                           '------'
Kernel memory detail

                                                 objects: | count |  size |  MiB  |
                            pid                           |240640 |   128 |  30.8 |
                          dentry                          |192570 |   192 |  37.0 |
                        Acpi-Parse                        |206079 |    56 |  11.5 |
                        kmalloc-64                        |196608 |    64 |  12.6 |
                        kmalloc-32                        |192512 |    32 |   6.2 |
                      lsm_file_cache                      |192512 |    16 |   3.1 |
                           filp                           |205248 |   256 |  52.5 |
                      anon_vma_chain                      |192512 |    64 |  12.3 |
                          ep_head                         |192768 |    16 |   3.1 |
                     sock_inode_cache                     |205181 |   832 | 170.7 |
                       signal_cache                       | 48132 |  1152 |  55.4 |
                           TCPv6                          | 48165 |  2432 | 117.1 |
                            TCP                           | 48174 |  2240 | 107.9 |
                           UDPv6                          | 48144 |  1344 |  64.7 |
                                                          '-------'-------'-------'
Comment 3 Stefano Brivio 2022-10-31 11:29:02 UTC
Also probably interesting:

> Memory usage: no ports forwarded
Memory usage summary

                                                                     type: |  MiB |
                                    used                                   | 41.1 |
                                 userspace                                 | 39.3 |
                                   kernel                                  |  0.1 |
                                                                           '------'
Comment 4 Stefano Brivio 2022-11-10 12:16:11 UTC
Update of progress so far:

- David is working on having a shared IPv4/IPv6 listening sockets, which can be done at least on Linux:
  https://archives.passt.top/passt-dev/20221104084333.3761760-1-david@gibson.dropbear.id.au/

  and should significantly cut down on memory usage, probably to something around 350-400 MiB in case all TCP and UDP ports are forwarded for IPv4 and IPv6

- we're exploring in parallel possible improvements in the Linux kernel, following one of these ideas:

  - implementing some minimised data structure to represent listening sockets. My note in the description of this ticket about minisock is clearly wrong, David pointed out: minisocks are for half-open incoming connection requests. But we could actually try to implement something like that for listening sockets too

  - implementing the possibility for a single socket to bind to multiple ports (wildcard sockets or something like that)
Comment 5 Stefano Brivio 2022-11-17 07:58:57 UTC
Initial results from unified IPv4/IPv6 sockets for TCP:

  https://archives.passt.top/passt-dev/20221116044212.3876516-1-david@gibson.dropbear.id.au/

on David's system, that decreases total memory usage from 677 MiB to 487 MiB. I think we can expect similar gains from the same change done for UDP sockets.
Comment 6 Stefano Brivio 2023-03-24 10:32:40 UTC
Decreasing priority as this is significantly improved now -- dual sockets still need to be implemented for UDP, and we're still experimenting with kernel-based solutions.