67 – Flaws in netlink usage

Bug 67 - Flaws in netlink usage

Summary: Flaws in netlink usage

Status:	RESOLVED FIXED

Alias:	None

Product:	passt
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	All Linux

Importance:	Normal normal
Assignee:	nobody

URL:

Depends on:
Blocks:

Reported:	2023-07-06 04:36 UTC by David Gibson
Modified:	2023-08-04 08:38 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description David Gibson 2023-07-06 04:36:21 UTC

While looking at how we might fix bug 60, I discovered a number of other problems with how we use netlink:

 1. We use 0 rather than 1 as our first sequence number.  This works, but isn't ideal, since 0 is usually used for asynchronous notifications for the kernel.

 2. When duplicating routes, we send a number of request messages in a single packet.  This may result in multiple response packets, but we only process the first one.  This means that responses can get out of sync for subsequent operations.  This is mitigated by the "flush" logic at the start of nl_req(), but that will only get rid of one stale response, there could be multiple

 3. When duplicating routes we send the same batch of requests multiple times, since earlier attempts might fail due to route dependencies.  However when we resumbit the requests we also reuse the sequence numbers.  This appears to work, but isn't how you're generally supposed to use netlink.

 4. In general we only process one reply datagram from a request, but it appears that the response can sometimes be split across multiple datagrams: in particular dump requests seem to have the actual responses and the NLMSG_DONE marker in separate datagrams.  The '
flush' logic in nl_req() again appears to handle this, but in a rather confusing way (we deal with extra packets on the next request, rather than as part of the request that prompted them).

Comment 1 David Gibson 2023-07-19 01:17:36 UTC

I'm working on a series to address these issues.

Comment 2 David Gibson 2023-08-04 08:38:01 UTC

Fixes now merged.

Note You need to log in before you can comment on or make changes to this bug.