Bug 48 - Don’t do NAT for non-local binds
Summary: Don’t do NAT for non-local binds
Status: CONFIRMED
Alias: None
Product: passt
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All Linux
: Normal feature
Assignee: nobody
URL:
Depends on:
Blocks:
 
Reported: 2023-03-24 17:17 UTC by Stefano Brivio
Modified: 2023-03-24 17:17 UTC (History)
0 users

See Also:


Attachments

Description Stefano Brivio 2023-03-24 17:17:43 UTC
Originally reported by lemmi@nerd2nerd.org, see also: https://pad.passt.top/p/NonLocalBinds and https://oswalt.dev/2022/02/non-local-address-binds-in-linux/

With non-local binds one can many containers on a single host, each with
their own IP address, without the host needing to configure each of
potentially hundreds of addresses on an interface.

This already works with pasta, but will result in an unnecessary address
translation (NAT) step.

> Host setup
On the host, specify a prefix that is either routed to the host, or made
available through NDP proxy (e.g. with the help of ndppd).

    sysctl net.ipv6.ip_nonlocal_bind=1
    ip -6 r add local 2001:db8:0:0:ffff::/80 dev lo

> Container setup
Here are several combinations that work in the sense that a service
running in the container is indeed reachable from the outside with the
non-local IPv6 address. But within the container, incoming packet
destination addresses are translated to the link-local address of the
container interface.

In the following examples, the container runs a small echo server on tcp
port 3000 via netcat.

Explicitly setting listening address and port

    podman run --rm -it --network pasta:-6,-t,2001:db8:0:0:ffff::/3000 alpine:latest nc -l -p 3000 -k -e cat

This results in the container getting assigned a host address. The port
is opened with the correct listening address and multiple containers
with different addresses can be run.

Yet incoming packets are translated to the link-local address of the
container.

    ss -tlenH sport = :3000
    LISTEN 0 128 [2001:db8:0:0:ffff::]:3000 [::]:*

Only setting container address

    podman run --rm -it --network pasta:-6,-a,2001:db8:0:0:ffff::,-t,auto,-T,auto alpine:latest nc -l -p 3000 -k -e cat

Unsurprisingly this will listen on the wildcard address. But the service
is still reachable via the non-local address, although this way it is
not possible to run multiple containers on the same port.

NAT still takes place, although the container interface now has the
non-local address configured.

    ss -HtlnO sport = :3000
    LISTEN 0 128 [::]:3000 [::]:*

Setting container address, listening address and port explicitly

    podman run --rm -it --network pasta:-6,-a,2001:db8:0:0:ffff::,-t,2001:db8:0:0:ffff::/3000 alpine:latest nc -l -p 3000 -k -e cat

Behaviour for incoming addresses is still the same as the first case.
NAT still takes place.

Note You need to log in before you can comment on or make changes to this bug.