Skip to content

Conversation

@RoundofThree
Copy link
Member

If the new base pointer obtained in snl_realloc_msg_buffer does not have the same bounds as the original base pointer, we need to re-derive all the capabilities that are relative to the base. This was found while fuzzing.

Tagging @YiChenChai.

If the new base pointer obtained in snl_realloc_msg_buffer
does not have the same bounds as the original base pointer,
we need to re-derive all the capabilities that are relative to
the base.
Comment on lines +1102 to +1104
int hdr_off = (char *)(nw->hdr) - nw->base;
nw->hdr = (struct nlmsghdr *)
(void *)((char *)nw->base + hdr_off);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int hdr_off = (char *)(nw->hdr) - nw->base;
nw->hdr = (struct nlmsghdr *)
(void *)((char *)nw->base + hdr_off);
int hdr_off = (char *)(nw->hdr) - nw->base;
nw->hdr = (struct nlmsghdr *)
(void *)((char *)nw->base + hdr_off);

to match existing formatting

}
#ifdef __CHERI_PURE_CAPABILITY__
if (cheri_getlen(new_base) != cheri_getlen(nw->base)) {
nw->base = (char *)cheri_setboundsexact(new_base, nw->size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't snl_allocz already do this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I see now, we don't use the narrowed pointer returned from lb_allocz. Instead, this function violates the API and examines lb->base directly. However, I think this also means we can't use cheri_setboundsexact here as lb_allocz might have rounded the size up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but this also means that since we are always using ss->lb->base, the bounds shouldn't ever change? That is, the pointer should always point to the entire range of the linear buffer and never have different bounds. It might be really helpful to see more information from the reproducer if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the panic happened on an older version before merging upstream commit 0c511ba. The previous version did use the return value of snl_allocz, but even then the return value should never have been the same address?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an alternate approach to try to fix upstream bugs in this function at https://reviews.freebsd.org/D53697

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I reproduced the crash locally and figured out the cause. In snl_init_writer, the return value of snap_allocz is honored and so nw->base has narrower bounds. Later when snl_realloc_msg_buffer is called, snl_allocz will allocate space at the end of the same lb, but snl_realloc_msg_buffer ignores the return value. My upstream fix for FreeBSD needs to apply its logic to use the entire lb in snl_init_writer as well. OTOH, that mostly defeats the point of SNL_WRITER_BUFFER_SIZE as it will always end up being SCRATCH_BUFFER_SIZE in practice. The fact that the snl_state() starts off with 1024 byte scratch buffer, and the snl_writer() starts of using the first 256 bytes of that buffer for a scratch space itself seems like a very complicated design for no justification I can see.

@gvnn3
Copy link
Contributor

gvnn3 commented Nov 26, 2025

Bump? Can this be backported from upstream or?

@jrtc27
Copy link
Member

jrtc27 commented Nov 26, 2025

I believe freebsd/freebsd-src@828df4d was an alternative fix for this that should be cherry-picked

@bsdjhb
Copy link
Collaborator

bsdjhb commented Dec 1, 2025

I still don't understand how the bug even occurs currently, though backporting my upstream fix is probably part of fixing this more properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants