Bug 4 - build: Fixes to build with musl
Summary: build: Fixes to build with musl
Status: RESOLVED FIXED
Alias: None
Product: passt
Classification: Unclassified
Component: build (show other bugs)
Version: unspecified
Hardware: All Linux
: Normal enhancement
Assignee: nobody
URL:
Depends on:
Blocks:
 
Reported: 2021-10-23 09:58 UTC by Stefano Brivio
Modified: 2023-03-14 09:51 UTC (History)
1 user (show)

See Also:


Attachments
Patch to make it compile on musl (2.06 KB, patch)
2023-03-02 11:51 UTC, KuhnChris
Details | Diff
Patch to avoid truncation of rtnetlink message in nl_link() (1.74 KB, patch)
2023-03-03 08:22 UTC, Stefano Brivio
Details | Diff
Capture issue DNS (5.24 KB, application/vnd.tcpdump.pcap)
2023-03-03 13:15 UTC, KuhnChris
Details
Patch to also accept IP packets with bitswapped Ethertype (290 bytes, patch)
2023-03-03 16:03 UTC, Stefano Brivio
Details | Diff

Description Stefano Brivio 2021-10-23 09:58:57 UTC
I didn't even try yet, but I expect some adjustments are needed.
Comment 1 KuhnChris 2023-03-02 11:51:53 UTC
Created attachment 2 [details]
Patch to make it compile on musl

Tested it out on latest alpine on ARM64.
Basically, GCC complained about "stderr" being a protected keyword, so I renamed it to "_stderr" (weird, I know), and some byteness swap thing for the htons/htonl_constant. Also a couple of missing "implied" header files.

This just makes it `compile`. The real test is still ongoing and I will add more data here as soon as I get it to work with podman.
Comment 2 Stefano Brivio 2023-03-02 12:54:44 UTC
(In reply to KuhnChris from comment #1)
> Patch to make it compile on musl
Nice, thanks!

> Tested it out on latest alpine on ARM64.
> Basically, GCC complained about "stderr" being a protected keyword, so I
> renamed it to "_stderr" (weird, I know)
Uh oh, I guess that's a side effect of "#define stderr (stderr)" in musl's stdio.h.

For simplicity (or at least as a temporary workaround, in case the fix is straightforwrad) I'd suggest renaming that to force_stderr.

> and some byteness swap thing for the htons/htonl_constant.
As I was suggesting at https://github.com/void-linux/void-packages/pull/42517/commits/1a3a6a21de2ee118f98cf9730f806cd7b94a86dc#r1122778040 (I'm not sure if you're also the author of that commit!), we should eventually provide a open-coded version for C libraries that don't provide those __bswap_constant_* macros.

> Also a couple of missing "implied" header files.
> 
> This just makes it `compile`. The real test is still ongoing and I will add
> more data here as soon as I get it to work with podman.
Great, I was afraid it would be a bit harder than this.

If it's convenient for you, you can also submit patches to passt-dev@passt.top (see https://passt.top/passt/about/#contribute) -- from a git commit you just need to:

    git format-patch -1
    git send-email --to=passt-dev@passt.top PATCH

(see also https://git-scm.com/docs/git-send-email#_examples). But if it's too much of a burden I'm also happy to take patches from here.
Comment 3 KuhnChris 2023-03-02 13:50:35 UTC
I'll rename the parameter from _stderr to force_stderr, sure. 
Also, no, I did have a voidlinux install a while back, but I'm not connected to that, and yes, I agree, the bswap_constant cause a couple of packages to not be portable to musl or other non glibc platforms.

I'd first test out the pasta/passt a bit on alpine and then send in the patch if you want, or we can do it now, so it "builds" and add any separate fixes in separate bug reports, just let me know how you want to handle it. :-)

Thanks!
Comment 4 Stefano Brivio 2023-03-02 13:55:08 UTC
(In reply to KuhnChris from comment #3)
> I'll rename the parameter from _stderr to force_stderr, sure. 
> Also, no, I did have a voidlinux install a while back, but I'm not connected
> to that
Hah, funny. :)

> and yes, I agree, the bswap_constant cause a couple of packages to
> not be portable to musl or other non glibc platforms.
> 
> I'd first test out the pasta/passt a bit on alpine and then send in the
> patch if you want, or we can do it now, so it "builds" and add any separate
> fixes in separate bug reports, just let me know how you want to handle it.
> :-)
No no, please take your time to test before sending patch(es). Thanks!
Comment 5 KuhnChris 2023-03-02 23:41:07 UTC
Hmm, OK, so the "basic" test works, but only if I disable the "MAC not null" check - I tested this on a musl ARM64 install (alpine) and a musl X86_64 install (alpine, but on OVH) - both express the same issue: 

# Compile passt with debug symbols
➜  CC="gcc -g" make
gcc -g -Wall -Wextra -pedantic -std=c99 -D_XOPEN_SOURCE=700 -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -O2 -pie -fPIE -DPAGE_SIZE=4096 -DNETNS_RUN_DIR=\"/run/netns\" -DPASST_AUDIT_ARCH=AUDIT_ARCH_AARCH64 -DRLIMIT_STACK_VAL=8192 -DARCH=\"aarch64\" -DVERSION=\"2023_02_27.c538ee8-1-gfbac011\" -DDUAL_STACK_SOCKETS=1 -DHAS_SND_WND -DHAS_BYTES_ACKED -DHAS_MIN_RTT -DHAS_GETRANDOM -fstack-protector-strong   arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c icmp.c igmp.c isolation.c lineread.c log.c mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c siphash.c tap.c tcp.c tcp_splice.c udp.c util.c -o passt
ln -sf passt pasta
➜ gdb --args ./pasta -4

start
b 241
c
step
b 1568
c
step
b 662
c

(or directly: "b conf.c:662" and "c")

and then stepping through the NETLINK stuff, I get only a "P" back for mac, which is too short, so nl_req bails.

nl_link (ns=0, ifi=ifi@entry=2, mac=mac@entry=0xfffffff349d0, up=up@entry=0, mtu=mtu@entry=0) at netlink.c:436
nl_req (ns=ns@entry=0, buf=buf@entry=0xffffffe2bb58 "P", req=req@entry=0xffffffe2bb18, len=32) at netlink.c:118
conf_ip4 (mac=0xfffffff349d0 "", ip4=0xfffffff349e0, ifi=<optimized out>) at conf.c:665

which in turn causes conf_ip4 to fail. (MAC_IS_ZERO(mac))

if I remove the MAC_IS_ZERO check, it does work, enter the namespace, lets itself be configured with dhclient and can connect internet and host.

Kind of at a loss what this could be, a errorous implementation of netlink on the musl side? (GW + IP4 are properly configured)


Please let me know if I can do any more checks here - else I'll send in the patch to make it compile for now and worry about \that\ issue in a separate bug.
Comment 6 Stefano Brivio 2023-03-03 06:34:14 UTC
(In reply to KuhnChris from comment #5)
> Please let me know if I can do any more checks here - else I'll send in the
> patch to make it compile for now and worry about \that\ issue in a separate
> bug.

Let me have a look at what happens to MAC_IS_ZERO() and I'll let you know.

Meanwhile, feel free to post a patch for review (it might be convenient to also have others look into the issue), but we have to figure this out before actually applying that patch.
Comment 7 Stefano Brivio 2023-03-03 07:52:13 UTC
By the way, -M 00:11:22:33:44:55 hides the issue.

Checked with strace:

sendto(5, [{nlmsg_len=32, nlmsg_type=RTM_GETLINK, nlmsg_flags=NLM_F_REQUEST, nlmsg_seq=7, nlmsg_pid=0}, {ifi_family=AF_UNSPEC, ifi_type=ARPHRD_NETROM, ifi_index=if_nametoindex("eth0"), ifi_flags=0, ifi_change=0}], 32, 0, NULL, 0) = 32

recvfrom(5, [{nlmsg_len=1388, ... [{nla_len=10, nla_type=IFLA_ADDRESS}, 52:54:00:92:a5:6c],  ...)

so we actually get the MAC address, but fail to parse the netlink message or store the address. Still checking.
Comment 8 Stefano Brivio 2023-03-03 08:22:29 UTC
Created attachment 3 [details]
Patch to avoid truncation of rtnetlink message in nl_link()

This fixes the MAC address issue for me -- feel free to include this in your patch.

On musl, BUFSIZ is 1024, so we'll typically truncate the response to the request we send in nl_link(). It's typically 8192 or more with glibc.

There doesn't seem to be any macro defining the rtnetlink maximum message size, and iproute2 just hardcodes 1024 * 1024 as receive buffer, but the example in netlink(7) makes somewhat sense, looking at the kernel implementation.

It's not very clean, but we're very unlikely to hit that limit, and if we do, we'll find out painlessly, because NLA_OK() will tell us right away.
Comment 9 KuhnChris 2023-03-03 12:37:51 UTC
>By the way, -M 00:11:22:33:44:55 hides the issue.
Yeah I figured that out, but I feared I may run into an issue further down the line, especially with podman, and maybe with some routing related stuff. I'm only on a very "basic" understanding of networking (aside from knowing of the existence of "OSI layers"), so I have to depend on you guys for those kind of deeper low-level network analysis stuff and checks to make sure my 'fixes' don't break some important stuff.


Sorry for having you dragged into this with my analysis, I know there are more important topics to solve, so if it is inconvenient to you just let me know!


I'll test your patch and let you know how it turns out - if it generally works, I'll send the patch + your patch upstream to the mailing list.


Again, thanks and have a good one,
Chris
Comment 10 KuhnChris 2023-03-03 13:15:41 UTC
Created attachment 4 [details]
Capture issue DNS

OK, we run into the next problem.
Compiling and running now seems fine, now we have an issue with resolving domains, and rather, connecting to anything that is not ICMP.
I tried it on the x86_64 machine, to make sure this is not due to some weird special lock-in on the ARM one (which is running on the Oracle Cloud), but ran into the same issues.
I added a PCAP for the following scenario:
* start pasta
* get ip
* ping ip (ok)
* ping domain name (nok)
* dig (nok)


insanity:~/passt% ./pasta -p cap2.pcap -4                                                                                                                                                                                           
Saving packet capture to cap2.pcap
insanity:~/passt# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2e:e2:e4:c5:5e:66 brd ff:ff:ff:ff:ff:ff
insanity:~/passt# ip link set eth0 up
insanity:~/passt# udhcpc -i eth0
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: broadcasting select for 37.59.36.123, server 37.59.36.254
udhcpc: lease of 37.59.36.123 obtained from 37.59.36.254, lease time 4294967295
insanity:~/passt# ping 4.2.2.1
PING 4.2.2.1 (4.2.2.1): 56 data bytes
64 bytes from 4.2.2.1: seq=0 ttl=255 time=10.865 ms
^C
--- 4.2.2.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 10.865/10.865/10.865 ms
insanity:~/passt# ping google.com
^C
insanity:~/passt# dig google.com
^C#
insanity:~/passt# dig google.com @4.2.2.1
;; communications error to 4.2.2.1#53: timed out
^C#
insanity:~/passt# ping 172.217.18.110
PING 172.217.18.110 (172.217.18.110): 56 data bytes
64 bytes from 172.217.18.110: seq=0 ttl=255 time=13.907 ms
64 bytes from 172.217.18.110: seq=1 ttl=255 time=13.778 ms
64 bytes from 172.217.18.110: seq=2 ttl=255 time=13.419 ms
^C
--- 172.217.18.110 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 13.419/13.701/13.907 ms
insanity:~/passt# ping 37.59.36.254
PING 37.59.36.254 (37.59.36.254): 56 data bytes
64 bytes from 37.59.36.254: seq=0 ttl=255 time=2.561 ms
64 bytes from 37.59.36.254: seq=1 ttl=255 time=12.305 ms
^C
--- 37.59.36.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 2.561/7.433/12.305 ms
insanity:~/passt# ping localhost
PING localhost (::1): 56 data bytes
64 bytes from ::1: seq=0 ttl=64 time=0.241 ms
64 bytes from ::1: seq=1 ttl=64 time=0.236 ms
^C
--- localhost ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.236/0.238/0.241 ms
insanity:~/passt# exit

[host]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:22:4d:7a:65:2a brd ff:ff:ff:ff:ff:ff
    inet 37.59.36.123/24 brd 37.59.36.255 scope global dynamic noprefixroute eth0
       valid_lft 66237sec preferred_lft 55437sec
    inet6 fe80::8c65:b403:1626:8328/64 scope link
       valid_lft forever preferred_lft forever

Let me know if I did maybe something wrong? I also tried to use the "host" ip as DNS server (37.59.36.254) but it also doesn't work.
For curiousity's sake I also tried curl, to no avail, same issue as with the dns, even when using the IP.

Let me know if you got any ideas, I checked the pcap file, but I cannot make sense out of the LLC stuff there...

Thanks,
Chris
Comment 11 Stefano Brivio 2023-03-03 15:28:32 UTC
Weird: if you look at offset 12 for packets 18 and 19, which is the 802.3 type (16 bits), you'll see that those bytes are swapped. For IPv4 (and for other packets in that capture) it's 0x80 0x00 (it's a big endian value), for those we receive from the namespace it's 0x00 0x80.

I'm not sure yet where the problem is, I'll try to have a more thorough look in a bit.
Comment 12 Stefano Brivio 2023-03-03 15:58:51 UTC
(In reply to Stefano Brivio from comment #11)
> For IPv4 (and for
> other packets in that capture) it's 0x80 0x00 (it's a big endian value), for
> those we receive from the namespace it's 0x00 0x80.
Oops, I meant 0x08 0x00 and 0x00 0x08.
Comment 13 Stefano Brivio 2023-03-03 16:03:10 UTC
Created attachment 5 [details]
Patch to also accept IP packets with bitswapped Ethertype

I did the same tests with an Alpine musl build in my environment, and it all works for me.

I have no idea yet where that bitswapped Ethertype might come from. Do you have the possibility to take an additional capture, using e.g. tcpdump or tshark, in the namespace, on the tap device?

The patch attached should allow passt to process those packets normally and allow you to proceed, but there's no apparent reason why it's needed, so I wouldn't apply it (as it's fundamentally wrong) before figuring out what's going on here.
Comment 14 KuhnChris 2023-03-03 19:23:11 UTC
Well, I do have root access on my servers (the AMD/X86_64 one is a OVH dedicated server, the ARM64 one is a Oracle VM), so we can do any kind of tshark/tcpdump we want, if you want me to recording something specific, let me know - I can also jump on IRC or Discord if it makes it easier for you to align.
Comment 15 Stefano Brivio 2023-03-03 19:30:03 UTC
(In reply to KuhnChris from comment #14)
> Well, I do have root access on my servers (the AMD/X86_64 one is a OVH
> dedicated server, the ARM64 one is a Oracle VM), so we can do any kind of
> tshark/tcpdump we want, if you want me to recording something specific, let
> me know

I would just need a capture from the namespace where the tap device is (no need to be root), while you try to resolve addresses, something like:

   tcpdump -ni <interface name> -s0 -w dns_query_from_namespace.pcap &
   dig google.com
   # dig times out
   killall tcpdump

> I can also jump on IRC or Discord if it makes it easier for you to align.

I'm always on #passt on IRC at libera.chat (sbrivio there... or on OFTC). I might not read right away though.
Comment 16 Stefano Brivio 2023-03-14 09:51:31 UTC
Fixed with this series:

  https://archives.passt.top/passt-dev/20230308073516.2189680-1-sbrivio@redhat.com/

in 2023_03_09.7c7625d, see also:

  https://github.com/void-linux/void-packages/pull/42517/commits/fe40c24fb76ee517dcb2a72f3ace8560aea02672

thanks a lot KuhnChris for finding issues and patching them!

Note You need to log in before you can comment on or make changes to this bug.