LX libaudit clients cause kernel memory usage to expand until exhausted

The Linux `libaudit` library uses `NETLINK` sockets to send audit events. It opens a netlink socket, then calls `sendto()` to write the event out (see https://github.com/linux-audit/audit-userspace/blob/master/lib/netlink.c#L239) and then it checks for an ACK from the kernel.

When checking for an ACK, libaudit looks specifically for an `NLMSG_ERROR` message: https://github.com/linux-audit/audit-userspace/blob/master/lib/netlink.c#L287 It does `recvfrom()` with `MSG_PEEK` to look for the `NLMSG_ERROR` code, and then only does a real read without `MSG_PEEK` once it's seen it. If any other message arrives, it will never read from the netlink socket again.

Unfortunately, in e.g. `lx_netlink_au_um`, we just call `lx_netlink_reply`, which sets up an `NLMSG_DONE` message (with `NLM_F_MULTI` in the header). This is not the kind of ACK which libaudit is expecting, and so libaudit never actually reads from its netlink socket on LX.

When libaudit stops reading from the netlink socket, replies start to queue up. Normally this backpressure is handled by the layer calling the `su_recv` callback (e.g. the TCP/IP stack) -- you're meant to watch for `ENOSPC` from that socket upcall and set a flag to stop sucking in new messages until the downcall comes to tell you things are unblocked again.

Unfortunately, in `lx_netlink_reply_sendup`, after we call `su_recv`, we have:

```
	if (error != 0)
		lx_netlink_flowctrld++;
```

And that's it. End of function. We don't set any flags, we just increment a global counter (which is never read anywhere in the code). This means that we can accumulate replies on the socket queue of a netlink socket indefinitely.

Now, this might not sound like a big deal: each netlink reply is ~20 bytes long, you may say we can accumulate an awful lot of them before this becomes a critical issue. Alas, in `lx_netlink_reply_msg` we always call `allocb()` with `lxns_bufsize` which is set to 4096. Because of the header on the front, this actually results in an allocation from the `kmem_alloc_8192` cache. For each one of these replies on the queue we are setting aside a bit over 8k of memory.

What's even better is that amongst libaudit's clients is the ever-wonderful `systemd`. It runs for a very long time, and it produces one of these audit events every time a unit (service) changes state.

On a machine with ~150 LX zones running, I am currently allocating a bit over 1GB per day of these buffers due to systemd alone, which will persist until the machine or the zones are rebooted. Eventually, the kernel memory usage expands and pushes out ARC, causes `kmem_reap` to kick in, and the machine grinds to a halt and never recovers.

Netlink should be replying to audit requests with single-part `NLMSG_ERROR` responses to be compatible with real Linux, and the LX netlink code needs to correctly handle `ENOSPC` from `su_recv`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LX libaudit clients cause kernel memory usage to expand until exhausted #366

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LX libaudit clients cause kernel memory usage to expand until exhausted #366

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions