Bug 57 - DNS resolving stops working after some time running pasta (via podman)
Summary: DNS resolving stops working after some time running pasta (via podman)
Status: RESOLVED FIXED
Alias: None
Product: passt
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All Linux
: Normal quite bad
Assignee: David Gibson
URL:
Depends on:
Blocks:
 
Reported: 2023-06-24 17:09 UTC by KuhnChris
Modified: 2023-11-07 16:06 UTC (History)
3 users (show)

See Also:


Attachments

Description KuhnChris 2023-06-24 17:09:14 UTC
This is in reply to the IRC thread sbrivio and me had been having for a couple of weeks, to sum up the entire issue:

- If you setup a pasta network for a while, everything works as expected
- After some time passes, the DNS requests simply do not get forwarded to the pasta net-ns anymore
- This means that services running on the pasta network suddenly seem to lose connection to any dns-bound outside services
- The issue is recovering every couple of requests, so a solution is to just hammer out pings and dns requests until one succeeds
- This happens on MUSL and LIBC based distributions (alpine, gentoo and debian)
- The DNS request is sent outside, and the DNS reply is also received, but it is just never forwarded.

Due to the unstable nature of the issue, providing a log is very time-consuming, and, due to a different bug with pasta, currently not in my powers.
Comment 1 David Gibson 2023-07-04 04:43:36 UTC
Hm, there's not a lot to go on here in terms of what's triggering this behaviour.

Let's try to pin down some basics to begin with.  KuhnChris,

1. Are you using any podman options to control DNS behaviour?

2. Are you using an pasta options to control DNS behaviour (e.g. --dns-match, --dns-forward)?

3. What is the name resolution setup on the *host*?  If you can include your host's /etc/hosts that would be useful

4. Are you using systemd-resolved on either the host, or within your container?

5. What test did you use to determine that the DNS reply is received but not forwarded by pasta?
Comment 2 KuhnChris 2023-07-04 10:14:29 UTC
Hello David,

Sure, let's see:

1.) Nothing special, just plain vanilla `podman pod`, no extra DNS things
2.) No special flags were provided, below I put the whole command I use
3.) Nothing special here either, default one here:
root@gcvps:/data/podman/logs# cat /etc/hosts
# Generated by SolusVM
127.0.0.1       localhost localhost.localdomain
::1     localhost localhost.localdomain
45.92.156.153   gcvps.kuhnchris.eu

For completeness' sake here the one from within a container:
kuhnchris@gcvps:~$ podman exec caddy cat /etc/hosts
127.0.0.1       localhost localhost.localdomain
::1     localhost localhost.localdomain
45.92.156.153   gcvps.kuhnchris.eu
45.92.156.153   host.containers.internal
127.0.0.1       pasta-pod mysql-podman
127.0.0.1       redis
127.0.0.1       webserver-php
127.0.0.1       webserver-web
127.0.0.1       caddy
127.0.0.1       postgresql-podman
127.0.0.1       authentik-server
127.0.0.1       authentik-worker
127.0.0.1       authentik-ldap
127.0.0.1       wg-access-server
127.0.0.1       nextcloud
127.0.0.1       libreoffice-online
127.0.0.1       code-server

4.) No, I like to avoid systemd where possible.
5.) We checked the pcap before, sadly we lost it before I could post it, as soon as I am able to reproduce it consistantly, I will attach it to this ticket - it just takes a while on that GCVPS server (since it is not a productive server)
Comment 3 KuhnChris 2023-07-04 10:17:28 UTC
The pod creation commandline:

podman pod create --name=pasta-pod --network=pasta:-t,80,-t,443,-u,51820,--trace,--pcap,/data/podman/logs/pasta-pcap-`date +%s`.pcap,--log-file,/data/podman/logs/pasta-logfile-`date +%s`.log,--log-size,1
073741824

Without logging it would be:
podman pod create --name=pasta-pod --network=pasta:-t,80,-t,443,-u,51820,-t,25


My "start command" for containers

default_params="--pod=pasta-pod --restart unless-stopped"

start_container () {
  local container_name="$1"                                                                                                                                                                                   
  shift 1
  podman create --name=${container_name} ${default_params} $@ && \                                                                                                                                           
  podman start ${container_name}                                                                                                                                                                         }

start_container "redis" \
  -v $BASEDIR/redis:/data \
  redis:alpine
Comment 4 David Gibson 2023-07-05 00:58:45 UTC
1 & 2)
Thanks for the information.  That probably rules out issues with the DNS related special-case NATs.

3)

Thanks for the info, but apologies I made a mistake: I meant to ask for /etc/resolv.conf rather than /etc/hosts.  For both the host and container would be useful.

4)

Ok, understood.  Probably not due to the general flakiness of systemd-resolved I've sometimes observed then.

5)

Ok, understood.


====

No good theories so far, I'm afraid.  I'll keep thinking about it.
Comment 5 David Gibson 2023-08-15 03:59:33 UTC
The symptoms here don't quite match, but it's not impossible this is related to bug 70.  There's a draft fix for that here: https://gitlab.com/dgibson/passt/-/tree/podman19405?ref_type=heads


If you could give that a try, that might show something up.
Comment 6 David Gibson 2023-08-21 05:55:42 UTC
The fix for the bug noted in comment 5 is now merged upstream.
Comment 7 Jay 2023-10-20 11:58:04 UTC
After moving my monitoring container, uptime-kuma, from docker to podman and pasta I'm experiencing the same issue.
The longer the container is up, more dns requests fail. After a week only 30% succeed.

I'm running pasta 2023_10_04.f851084 on openSUSE Tumbleweed.

My host and container resolv.conf are the same:
search local.lan
nameserver 192.168.0.254

podman run \
    --rm \
    -d \
    --pull newer \
    --name uptime-kuma \
    --userns nomap \
    --network pasta:-t,3001:3001,--trace,--pcap,/home/username/podman/logs/pasta-pcap-`date +%s`.pcap,--log-file,/home/username/podman/logs/pasta-logfile-`date +%s`.log,--log-size,1073741824 \
    --cap-add NET_RAW \
    --restart no \
    --stop-timeout 30 \
    --memory 1g \
    -e PUID=1122 \
    -e PGID=1122 \
    -v ~/podman/containers/uptime-kuma/data:/app/data \
    docker.io/louislam/uptime-kuma:1
    
 I uploaded 22 hours of logging to: https://drive.proton.me/urls/CWBNJEZT54#GTrTosRiFEqV
 
 Is there something else I can do to help?
Comment 8 David Gibson 2023-10-26 02:28:02 UTC
Sorry I haven't looked at this for a while, I've been caught up with other bugs.

It's not clear at this stage if the problem described in comment 7 has the same cause as the original problem in comment 0 - there are some differences in details which might be significant.  However let's track them both here for now, until it's clearer one way or the other.

I'm going to focus on the comment 7 issue for now, since it's occurring with a very recent pasta version, and we have some detailed logs.

The logs and packet capture are interesting, but some more context would help me to understand them:

1) What is the IP address of the container?
2) What is the IP address of the host?
3) You've shown the resolv.conf, but can you verify that's not being changed by something during the lifetime of the container?  In particular I'm concerned about DHCP, NetworkManager or other tools changing the host's /etc/resolv.conf dynamically.

In the packet trace, I'm seeing some queries directed at the expected nameserver from your information, 192.168.0.254.  However, I'm also seeing a number of queries directed at 8.8.8.8, 9.9.9.9 and several other 192.168.0.xxx addresses, which is a bit surprising.

I am seeing a number of ICMP port unreachable errors, however those appear to be originating from the container, rejecting DNS responses that have been sent to it.  I suspect that's just responses coming in after the resolver has gotten an answer from somewhere else and closes its socket.
Comment 9 Jay 2023-10-26 07:15:20 UTC
No worries about the time of response. I’m grateful you are willing to look at this.

The host is 192.168.0.73/24, there is no IP specified for the container.
jay@pod01:~> podman inspect uptime-kuma | grep IP
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
jay@pod01:~> podman exec uptime-kuma hostname -I
192.168.0.73 

To diagnose the problem I started to make DNS requests to different internal and external servers to see if i could discern any pattern. But all behaved the same.

Every minute the container runs the equivalent of:
nslookup google.com 192.168.0.50
nslookup google.com 192.168.0.63
nslookup google.com 192.168.0.254
nslookup google.com 1.1.1.1
nslookup google.com 8.8.8.8
nslookup google.com 9.9.9.9

As the DNS requests are made to specific DNS servers, I think we can rule out changes to resolv.conf.

Your thoughts about the ICMP Destination unreachable errors were mine as well.
After plotting the errors over time you can see them keep increasing. That doesn't make sense to me if it were just normal too late responses.

I don't want to steer you in the wrong direction, but if I interpret the pcap in Wireshark right, for every Destination unreachable response there is a DNS request without an answer. 
Is it possible that the source or destination port gets mangled?
Comment 10 David Gibson 2023-10-27 04:13:41 UTC
(In reply to Jay from comment #9)
> No worries about the time of response. I’m grateful you are willing to look
> at this.
> 
> The host is 192.168.0.73/24, there is no IP specified for the container.

Ok, which means the container will also be getting 192.168.0.73 as its address, per pasta's defaults.

> jay@pod01:~> podman inspect uptime-kuma | grep IP
> "IPAddress": "",
> "IPPrefixLen": 0,
> "IPv6Gateway": "",
> "GlobalIPv6Address": "",
> "GlobalIPv6PrefixLen": 0,
> "LinkLocalIPv6Address": "",
> "LinkLocalIPv6PrefixLen": 0,

Huh.. I don't actually know where podman gets that information.  Possibly there's something that needs improving in the integration so that podman can know the address of the container in this case.

> jay@pod01:~> podman exec uptime-kuma hostname -I
> 192.168.0.73 
> 
> To diagnose the problem I started to make DNS requests to different internal
> and external servers to see if i could discern any pattern. But all behaved
> the same.
> 
> Every minute the container runs the equivalent of:
> nslookup google.com 192.168.0.50
> nslookup google.com 192.168.0.63
> nslookup google.com 192.168.0.254
> nslookup google.com 1.1.1.1
> nslookup google.com 8.8.8.8
> nslookup google.com 9.9.9.9

Aha! Knowing this the traffic I'm seeing makes a lot more sense.

> As the DNS requests are made to specific DNS servers, I think we can rule
> out changes to resolv.conf.

Right.

> Your thoughts about the ICMP Destination unreachable errors were mine as
> well.
> After plotting the errors over time you can see them keep increasing.

Loosely speaking, yes.

> That
> doesn't make sense to me if it were just normal too late responses.

Fair point.

> I don't want to steer you in the wrong direction, but if I interpret the
> pcap in Wireshark right, for every Destination unreachable response there is
> a DNS request without an answer. 

Well, sort of...

> Is it possible that the source or destination port gets mangled?

..because that was a very good catch.  It appears that replies are coming back to the wrong port, which is what triggers the ICMP message.
Comment 11 David Gibson 2023-10-27 05:03:48 UTC
So, I've been aware for a while that our "connection" tracking for UDP is very rudimentary and can be fooled by a number of situations where we talk to multiple hosts from the same port.  However, I haven't yet spotted a specific path which would cause the symptoms seen here.

A couple of additional questions:

 * Could you check the exact pasta command line that podman is invoking (by finding it in the ps output)?

 * What is the default gateway address? (on both host and container, though it expect it will be the same)

 * Would you be willing to run a version of pasta built from custom source, with extra instrumentation to help debug this?
Comment 12 Jay 2023-10-27 06:11:25 UTC
>  * Could you check the exact pasta command line that podman is invoking (by
> finding it in the ps output)?
/usr/bin/pasta --config-net -t 3001:3001 --trace --pcap /home/jeroen/podman/logs/pasta-pcap-1697712095.pcap --log-file /home/jeroen/podman/logs/pasta-logfile-1697712095.log --log-size 1073741824 -u none -T none -U none --no-map-gw --netns /run/user/1000/netns/netns-5dd80601-2225-3be0-714d-7ed41af3b4c5

>  * What is the default gateway address? (on both host and container, though
> it expect it will be the same)
192.168.0.254 for both

>  * Would you be willing to run a version of pasta built from custom source,
> with extra instrumentation to help debug this?
absolutely!
Comment 13 David Gibson 2023-10-30 08:12:11 UTC
Thanks for the extra information.  No surprises there, but it's good to be sure.

I've now prepared the first instrumented version of passt for debugging.  You can get the source here:
    https://gitlab.com/dgibson/passt/-/tree/bug57?ref_type=heads

Can you please run your containers as before with the following small tweaks:
   - Use the instrumented pasta version from the source above
   - Replace --trace with --debug (the new instrumentation is at the --debug level, and the extra things from --trace will just clutter the output)
   - Change your script to just nslookup with a single server (say 8.8.8.8) instead of several.  Nearby or overlapping requests to multiple servers will make the logs harder to interpret.

Leave it running long enough to observer DNS failures happening, then send me the new logs and packet capture.
Comment 14 Jay 2023-10-30 17:41:14 UTC
I really appreciate you taking the time to help me with this issue.

The files have been uploaded to the same place: https://drive.proton.me/urls/CWBNJEZT54#GTrTosRiFEqV

It seems to be going wrong with reused sockets:

9184.4566:          DNS from tap: 192.168.0.73:40108 -> 192.168.0.254:53
9184.4566:          DNS: reused IPv4 socket 147, src=40108
9184.4566:          DNS debug: Using socket 147, bound to 0.0.0.0:34420
9184.4737:          DNS debug: sock_handler, 1 packets for dstport=34420 ref.splice=0
9184.4737:          DNS debug: sock reply? 0/1: 192.168.0.254:53 -> 0.0.0.0->34420

And new behaviour, a crash:

11961.1428:          DNS from tap: 192.168.0.73:58650 -> 192.168.0.254:53
11961.1428:          DNS: reused IPv4 socket 149, src=58650
11961.1428: ERROR:   DNS debug: getsockname() failed: Bad file descriptor
Comment 15 David Gibson 2023-10-31 00:24:30 UTC
(In reply to Jay from comment #14)
> I really appreciate you taking the time to help me with this issue.
> 
> The files have been uploaded to the same place:
> https://drive.proton.me/urls/CWBNJEZT54#GTrTosRiFEqV

Thanks, I'll analyse those.

> It seems to be going wrong with reused sockets:

Right, I suspected that would be the case.

> 9184.4566:          DNS from tap: 192.168.0.73:40108 -> 192.168.0.254:53
> 9184.4566:          DNS: reused IPv4 socket 147, src=40108
> 9184.4566:          DNS debug: Using socket 147, bound to 0.0.0.0:34420
> 9184.4737:          DNS debug: sock_handler, 1 packets for dstport=34420
> ref.splice=0
> 9184.4737:          DNS debug: sock reply? 0/1: 192.168.0.254:53 ->
> 0.0.0.0->34420
> 
> And new behaviour, a crash:
> 
> 11961.1428:          DNS from tap: 192.168.0.73:58650 -> 192.168.0.254:53
> 11961.1428:          DNS: reused IPv4 socket 149, src=58650
> 11961.1428: ERROR:   DNS debug: getsockname() failed: Bad file descriptor

That's probably just a bug in the instrumentation code, not really relevant to the original problem, but I'll check.
Comment 16 David Gibson 2023-10-31 01:03:12 UTC
After analyzing the new log, I think I've spotted the problem.  It looks like when a stored socket was timed out and closed, we weren't actually removing it from the table.  That meant that a new request could get a stale fd.  That might be closed - leading to the EBADFD you saw with the instrumented code - or the fd number might have been re-used for an unrelated socket, leading to the sockets with the wrong bound port.

I've updated the branch at:
    https://gitlab.com/dgibson/passt/-/tree/bug57?ref_type=heads
with a draft fix (along with the instrumentation).

Can you give that a try and see how it works please.
Comment 17 Jay 2023-10-31 07:43:17 UTC
After one DNS request all further traffic fails.

Two new sets of logging have been uploaded. 
The second and newest one is a cleaner example:

> / # nslookup google.com 192.168.0.254
> Server:         192.168.0.254
> Address:        192.168.0.254:53

> Non-authoritative answer:
> Name:   google.com
> Address: 2a00:1450:400e:803::200e

> Non-authoritative answer:
> Name:   google.com
> Address: 142.251.36.46

> / # nslookup google.com 192.168.0.254
> nslookup: can't connect to remote host (192.168.0.254): Network is unreachable

And one example without logging:
> / # ping www.google.com
> PING www.google.com (142.251.36.4): 56 data bytes
> 64 bytes from 142.251.36.4: seq=0 ttl=255 time=18.529 ms
> ping: sendto: Network is unreachable
Comment 18 David Gibson 2023-10-31 23:01:06 UTC
(In reply to Jay from comment #17)
> After one DNS request all further traffic fails.

Apologies, I made a foolish error, meaning pasta will actually crash after the first DNS request.  I've fixed that and re-pushed, please try again.
Comment 19 Jay 2023-11-01 09:55:33 UTC
Nine out of ten times pasta now crashes after the first reused socket, but I've managed to capture one, weird to me, successful one.
Comment 20 David Gibson 2023-11-02 04:30:48 UTC
Ugh, sorry, another careless error. I'm working on fixing it now.

The good news is that in the meantime, I figured out a way to reproduce the problem locally, and (relatively) quickly with the following script.  Working is output AAAA then after ~3 minutes BBBB then AAAA again.  Failure generally shows as missing the final AAAA.

#! /bin/sh

PASTA=~/src/passt/pasta

EPORT=33333
APORT=44444
BPORT=55555
IFNAME="$(ip -j -4 route show | jq -rM '[.[] | select(.dst == "default").dev] | .[0]')"
GW="$(ip -j -4 route show | jq -rM '[.[] | select(.dst == "default").gateway] | .[0]')"
ADDR="$(ip -j -4 addr show "${IFNAME}" | jq -rM '.[0] | .addr_info[0].local')"

# Echo server on the host
socat UDP4-LISTEN:$EPORT,fork PIPE &

trap 'killall socat' EXIT

date
$PASTA --config-net -l pasta.log --debug -- sh -c "echo AAAA | socat STDIO UDP4:$GW:$EPORT,bind=$ADDR:$APORT; sleep 185; echo BBBB | socat STDIO UDP4:$GW:$EPORT,bind=$ADDR:$BPORT; echo AAAA | socat STDIO UDP4:$GW:$EPORT,bind=$ADDR:$APORT"
Comment 21 David Gibson 2023-11-03 08:39:22 UTC
Ok, that took me a bit longer than I expected to sort out.  Although the problem was conceptually simple, there were some fiddly details to figure out in fixing it.  Could you please try either this version (which has just the tentative fix):

https://gitlab.com/dgibson/passt/-/tree/bug57?ref_type=heads

Or this version (which has both the tentative fix and debugging instrumentation):

https://gitlab.com/dgibson/passt/-/tree/bug57debug?ref_type=heads
Comment 22 Jay 2023-11-05 10:02:18 UTC
It's been working beautifully the last couple of days. 
This fix makes pasta usable for me.
Thank you so very much.
I know how fiddly 'simple' problems can be.
Comment 23 Stefano Brivio 2023-11-07 16:06:12 UTC
Fixed in 2023_11_07.56d9f6d

Note You need to log in before you can comment on or make changes to this bug.