Skip to content

Conversation

@bsdjhb
Copy link
Collaborator

@bsdjhb bsdjhb commented Oct 30, 2025

PR for CI

mhorne and others added 30 commits November 25, 2024 17:08
T-HEAD CPUs provide a spec-violating implementation of page-based memory
types, using PTE bits [63:59]. Add basic support for this "errata",
referred to in some places as an "extension".

Note that this change is not enough on its own, but a workaround is
needed for the bootstrap (locore) page tables as well.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45472
Switch the boot argument registers to the unused s3 and s4. This ensures
the values will not be clobbered by SBI or function calls; they are
consumed late in the assembly routine.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D47457
The T-HEAD custom PTE bits are defined in such a way that the
default/normal memory type is non-zero value. This _unthoughtful_ choice
means that, unlike the Svpbmt and non-Svpbmt cases, this field cannot be
left bare in our bootstrap PTEs, or the hardware will fail to proceed
far enough in boot (cache strangeness). On the other hand, we cannot
unconditionally apply the PTE_THEAD_MA_NONE attributes, as this is not
compatible with spec-compliant RISC-V hardware, and will result in a
fatal exception.

Therefore, in order to handle this errata, we are forced to perform a
check of the CPU type at the first moment possible. Do so, and fix up
the PTEs with the correct memory attribute bits in the T-HEAD case.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D47458
The livedumper triggers reports from both of these sanitizers since it
necessarily accesses uninitialized or freed memory.  Add a flag to
silence reports from both sanitizers.

Reviewed by:	mhorne, khng
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D47714
PR:		282867
MFC:		stable/14
Approved by:	mhorne (via IRC)
Segment base registers are at 8-byte intervals, while the register
write helper takes a byte-aligned offset.  This fixes
DEV_TAB_HARDWARE_ERROR events and associated peripheral I/O failures
on an Epyc-based system with 8-segment device tables.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D47752
It may be the case that we want to avoid delivering signals that are
normally blocked by the thread's signal mask, in which case the syscall
should schedule this one instead to restore the mask prior to delivery.

This will be used by pselect/ppoll to avoid delivering signals that were
supposed to be blocked after the timeout has elapsed.  The name was
chosen as this is the expected behavior of pselect/ppoll, while late
restoration of the mask is exceptional behavior for these specific
calls.

__FreeBSD_version bump as later TDA_* values have changed, third-party
modules that may be using MOD3/MOD4 need to be rebuilt.

Reviewed by:	kib
Sponsored by:	Klara, Inc.
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D47741
It's possible to take a signal after pselect/ppoll have set their return
value, but before we actually return to userland.  This results in
taking a signal without reflecting it in the return value, which weakens
the guarantees provided by these functions.

Switch both to restore the signal mask before we would deliver signals
on return to userland.  If a signal was received after the wait was
over, then we'll just have the signal queued up for the next time it
comes unblocked.  The modified signal mask is retained if we were
interrupted so that ast() actually handles the signal, at which point
the signal mask is restored.

des@ has a test case demonstrating the issue in D47738 which will
follow.

Note for MFC: TDA_PSELECT is a KBI break, we should just inline
ast_sigsuspend() in pselect/ppoll for stable branches.  It's not exactly
the same, but it will be close enough.

Reported by:	des
Reviewed by:	des (earlier version), kib
Sponsored by:	Klara, Inc.
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D47741
The current quirk is designed to discard duplicated data read from
the chip.  Problem is, it also discards real events when they happen
to be identical, which is the case with scroll wheel events;
differently from X/Y they always move by fixed offset.  This results
in two-finger scroll that would stop mid-way that could be fixed by
manually setting dev.hms.0.drift_thresh to 0.

To fix that, don't discard duplicates when there's wheel movement.
For users with actual duplicates problem this will result in scroll
suddenly becoming quite inertial, but it will stop moving at any touch,
so shouldn't be terrible.

PR:		kern/276709
Reviewed By:	wulf
Differential Revision:	https://reviews.freebsd.org/D47640
D37419 corrupts VFP context store on signal delivery and D38696 corrupts PCB
because it performs a binary copy between structures with different layouts.
Revert the problematic parts of these commits to have signals delivery
working. Unfortunately, there are more problems with these revisions and
more fixes need to be developed.

Fixes: 6926e26
Fixes: 4d2427f
MFC after:	4 weeks
For TLS TX/RX, ratelimit, and IPSEC offload caps.

Reviewed by:	Ariel Ehrenberg <[email protected]>
Sponsored by:	NVidia networking
MFC after:	1 week
Reviewed by:	Ariel Ehrenberg <[email protected]>
Sponsored by:	NVidia networking
MFC after:	1 week
If a fragmented IPv6 packet hits a route-to rule we have to first prevent
the pf_test(PF_OUT) check in pf_route6() from refragmenting (and calling
ip6_output()/ip6_forward()). We then have to refragment in pf_route6() and
transmit the packets on the route-to interface.

Split pf_refragment6() into two parts, the first to perform the refragmentation,
the second to call ip6_output()/ip6_forward() and call the former from
pf_route6().

Add a test case for route-to-ing fragmented IPv6 packets to verify this works
as expected.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D47684
Use callout_init_mtx(9) to associate the callback with the driver's
lock. Also make sure the callout is stopped properly during detach.

While here, introduce a dummy_active() function to know when it's
appropriate to stop or not reschedule the callout.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 days
Reviewed by:	dev_submerge.ch, markj
Differential Revision:	https://reviews.freebsd.org/D47459
Consider the following scenario:

1. CHN currently has its trigger set to PCMTRIG_STOP.
2. Thread A locks CHN, calls CHANNEL_TRIGGER(PCMTRIG_START), sets the
   trigger to PCMTRIG_START and unlocks.
3. Thread B picks up the lock, calls CHANNEL_TRIGGER(PCMTRIG_ABORT) and
   returns a non-zero value, so it returns from chn_trigger() as well.
4. Thread A picks up the lock and adds CHN to the list, which is
   _wrong_, because the last call to CHANNEL_TRIGGER() was with
   PCMTRIG_ABORT, meaning the channel is stopped, yet we are adding it
   to the list and marking it as started.

Another problematic scenario:

1. Thread A locks CHN, sets the trigger to PCMTRIG_ABORT, and unlocks
   CHN. It then locks PCM and _removes_ CHN from the list.
2. In the meantime, since thread A unlocked CHN, thread B has locked it,
   set the trigger to PCMTRIG_START, unlocked it, and is now blocking on
   PCM held by thread A.
3. At the same time, thread C locks CHN, sets the trigger back to
   PCMTRIG_ABORT, unlocks CHN, and is also blocking on PCM. However,
   once thread A unlocks PCM, because thread C is higher-priority than
   thread B, it picks up the PCM lock instead of thread B, and because
   CHN is already removed from the list, and thread B hasn't added it
   back yet, we take a page fault in CHN_REMOVE() by trying to remove a
   non-existent element.

To fix the former scenario, set the channel trigger before the call to
CHANNEL_TRIGGER() (could also come after, doesn't really matter) and
check if anything changed one we lock CHN back.

To fix the latter scenario, use the SAFE variants of CHN_INSERT_HEAD()
and CHN_REMOVE(). A similar scenario can occur in vchan_trigger(), so do
the trigger setting after we've locked the parent channel.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 days
Reviewed by:	dev_submerge.ch
Differential Revision:	https://reviews.freebsd.org/D47461
This patch fixes multiple different panic scenarios occuring during
hot-unload:

1. The channel is unlocked in chn_read()/chn_write() for uiomove(9) and
   in the meantime we enter pcm_killchans() and free it. By the time we
   have returned from userland and try to lock it back, the channel will
   have been freed.
2. The parent channel has been freed in pcm_killchans(), but at the same
   time, some yet-unstopped vchan's chn_read()/chn_write() calls
   chn_start(), which eventually calls vchan_trigger(), which references
   the freed parent.
3. PCM_WAIT() panics because it references a freed PCM lock.

For scenarios 1 and 2, refactor pcm_killchans() to first make sure all
channels have been stopped, and then proceed to free them one by one, as
opposed to freeing the first free channel until all channels have been
freed. This change makes the code more robust, but might introduce some
performance overhead when many channels are allocated, since we
continuously loop through the channel list until all of them are
stopped, and then we loop one last time to free them.

For scenario 3, restructure the code so that we can use destroy_dev(9)
instead of destroy_dev_sched(9) in dsp_destroy_dev(). Because
destroy_dev(9) blocks until all references to the device have went away,
we ensure that the PCM cv and lock will be freed safely.

While here, move the delete_unrhdr(9) calls to pcm_killchans() and
re-order some lines.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 days
Reviewed by:	dev_submerge.ch
Differential Revision:	https://reviews.freebsd.org/D47462
Since SD_F_REGISTERED is cleared at the same time SD_F_DETACHING and
SD_F_DYING are set, and since PCM_DETACHING() is always used in
conjuction with PCM_REGISTERED()/DSP_REGISTERED(), it is enough to just
check SD_F_REGISTERED.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 days
Reviewed by:	dev_submerge.ch, markj
Differential Revision:	https://reviews.freebsd.org/D47463
The KASSERT in chn_sleep() can be triggered if more than one thread
wants to sleep on a given channel at the same time. While this is not
really a common scenario, tools such as stress2, which use fork() and
the child process(es) inherit the parent's FDs as a result, we can end
up triggering such scenarios.

Fix this by removing CHN_F_SLEEPING altogether, which is not very useful
in the first place:
- CHN_BROADCAST() checks cv_waiters already, so there is no need to
  check CHN_F_SLEEPING as well.
- We can check whether cv_waiters is 0 in pcm_killchans(), instead of
  whether CHN_F_SLEEPING is not set.

Reported by:	dougm, pho (stress2)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 days
Reviewed by:	dev_submerge.ch, markj
Differential Revision:	https://reviews.freebsd.org/D47559
No functional change intended.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 days
Reviewed by:	dev_submerge.ch, markj, emaste
Differential Revision:	https://reviews.freebsd.org/D47664
In numeric mode, the default route is printed as "default" rather
than 0.0.0.0/0 or ::/0

From the man page:
"-n: Show network addresses and ports as numbers.
Normally netstat attempts to resolve addresses and ports, and display
them symbolically.  If the -n option is specified, the address is
printed numerically, according to the address family.
For more information regarding the Internet IPv4 ``dot format'', refer
to inet(3).  Unspecified, or `wildcard'', addresses and ports appear
as `*''."

Reported By:	rgrimes
Reviewed by:	emaste, ngie, eadler, seanc
Relnotes:	yes
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D10320
sysctl(8) prints a newline after the description, no need for this extra
newline.

MFC after:	1 week
Correct swap_pager_seek_data so that, when the first lookup finds no
valid pages, second and subsequent lookups are attempted anyway.

This was broken by db08b0b.

Reported by:	[email protected]
Reviewed by:	kib
Tested by:	[email protected]
Fixes:	db08b0b tmpfs_vnops: move swap work to swap_pager
Differential Revision:	https://reviews.freebsd.org/D47767
Rewrite the hardware support list to a column list for inclusion in the
Hardware Release Notes. This makes clean subsections.  Tested in
MANWIDTH 59 and 80. While here, align sysctl list and tag spdx.

Reviewed by:	mhorne
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D47707
Following arm64 and risc-v, move definitions that describe
hardware-enforced layout of PTEs and #PF error bits, into a dedicated
header.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D47749
This is a somewhat cleaner and more future-proof way to get the correct
device table offsets.

Reviewed by:		kib
Reported by:		crest_freebsd_rlwinm.de
Fixes:			5035db2 "amdiommu: Fix device table segment
			base register offsets"
Differential Revision:	https://reviews.freebsd.org/D47769
These tests demonstrate the bug that was fixed in ccb973d.

Sponsored by:	Klara, Inc.
Sponsored by:	NetApp, Inc.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D47738
This makes it a bit easier to see where operations on a particular
filter are defined.  No functional change intended.

MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
gmshake and others added 28 commits October 30, 2025 10:23
Fixes:		375d797 Enable pvscsi and vmx in arm64 GENERIC
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47711
This was lost during the initial introduction of the pvscsi driver [1].
Later the driver was enabled on arm64 [2], so also install the man page
on arm64.

1. 052e12a Add the pvscsi driver to the tree
2. 375d797 Enable pvscsi and vmx in arm64 GENERIC

Reviewed by:	emaste, Alexander Ziaee <concussious.bugzilla_runbox.com> (manpages)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47712
The variable b[] is on the stack, thus cannot overlap with ipov, which
points to the heap area, so prefer memcpy() over memmove(), aka bcopy().

No functional change intended.

Reviewed by:	cc, rrs, cy, #transport, #network
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47713
Just return from jkfprintf if either (a) user lookup fails (that is,
getpwnam fails) or (b) setuid() to the user's uid fails.  If comsat is
invoked from inetd using the default of tty:tty we will now return due
to setuid() failing rather than fopen() failing.

PR:		270404
Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47823
Source nodes are created quite early in pf_create_state(), even before
the state is allocated, locked and inserted into its hash row. They are
prone to being freed by source node killing or clearing ioctl while
pf_create_state() is still running.

The function pf_map_addr_sn() can be called in two very different paths.

One is for filter rules where it is called from
pf_create_state() after pf_insert_src_node(). In this case it is called
with a given source node and does not perform its own search and must
return the source node.

The other one is for NAT rules where it is called from
pf_get_translation() or its descendants. In this case it is called with
no known source node and performs its own search for source nodes. This
source node is then passed back to pf_create_state() without locking.

The states property of source node is increased in pf_find_src_node()
which allows for the counter to increase when a packet matches the NAT
rule but not a pass keep state rule.

The function pf_map_addr() operates on unlocked source node.

Modify pf_find_src_node() to return locked on source node found, so
that any subsequent operations can access the source node safely.

Move sn->states++ counter increase to pf_insert_src_node() to ensure
that it's called only from pf_create_state() and not from NAT ruleset
path, and have it increased only if the source node has really been
inserted or found, simplifying the cleanup.

Add locking in pf_src_connlimit() and pf_map_addr(). Sprinkle mutex
assertions in pf_map_addr().

Add a function pf_src_node_exists() to check a known source node is
still valid. Use it in pf_create_state() where it's impossible to hold
locks from pf_insert_src_node() because that would cause LoR (nodes
first, then state) against pf_src_connlimit() (state first, then node).

Don't propagate the source node found while parsing the NAT ruleset to
pf_create_state() because it must be found again and locked or created.

Reviewed by:		kp
Approved by:		kp (mentor)
Sponsored by:		InnoGames GmbH
Differential Revision:	https://reviews.freebsd.org/D47770
Reported by:	jlduran
Fixes:	1f78bbb newsyslog.conf(5): Accept human unit suffix in the size filed
Co-authored-by:	Daniel Schaefer <[email protected]>
Reviewed by:	imp, wulf
Differential Revision:	https://reviews.freebsd.org/D47830
This implementation had various bugs.  bde@ reported that the unit
conversion/scaling is wrong, and it also does not handle 82574L or
igb(4) devices correctly.

With the new AIM code, it is expected most users will not need to
manually tune this.

If you do need static control:
hw.em.enable_aim=0 for all interfaces at boot or dev.em.X.enable_aim=0
for individual interfaces at runtime and they will track the
hw.em.max_interrupt_rate tunable.  That codepath has been bugfixed for
all supported chipsets.

You may view the current rate with dev.em.X.queue_rx_0.interrupt_rate
which has been bugfixed for all supported chipsets.

If you need to set different rates per interface for some reason let me
know and I will rethink how to add this back.  Otherwise you can leave
AIM on for general purpose interfaces and disable it at runtime on
special purpose low or high latency interfaces that would track
hw.em.max_interrupt_rate if you have a mix of concerns.

PR:		235031
Reported by:	Bruce Evans <[email protected]>
MFC after:	3 days
Relnotes:	yes
Sponsored by:	BBOX.io
Formally, there are 12 bits for TCP header flags.
Use the accessor functions in more (kernel) places.

No functional change.

Reviewed By: cc, #transport, cy, glebius, #iflib, kbowling
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47063
Add support for the AE Flag in the TCP header to pf and ppp.
Commonalize to the use of "E"(ECE), "W"(CWR) and "e"(AE)
for the TCP header flags, in line with tcpdump.

Reviewers: kp, cc, tuexen, cy, #transport!
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47106
This is similar to chroot(2), but takes a file descriptor instead
of path.  Same syscall exists in NetBSD and Solaris.  It is part of a larger
patch to make absolute pathnames usable in Capsicum mode, but should
be useful in other contexts too.

Reviewed By:	brooks
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D41564
Reviewed by:	emaste, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D47834
as required by IEEE Std 1003.1™-2024.

PR:	283014
Reported by:	Graham Percival <[email protected]>
Reviewed by:	emaste, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D47834
"netstat -rn" no longer prints the default route using symbol names, but
the test relied on it.  Update it to look for ::/0 instead.

MFC after:	1 week
Fixes:	9206c79 ("usr.bin/netstat: -n should not print symbolic names")
Reported by:	Sony Arpita Das @ Chelsio
Fixes:	5c15094 cxgbe(4): Update the board names of the T6 OCP cards.
MFC after:	1 week
Sponsored by:	Chelsio Communications
Correct `xatrr_to_extattr` to `xattr_to_extattr`.

Signed-off-by: Minseo Kim <[email protected]>
Reviewed by: imp,emaste,markj
Pull Request: freebsd/freebsd-src#1533
MFC after:	3 days

Reviewed by: imp
Pull Request: freebsd/freebsd-src#1521
The bcnt_fwd and bcnt_rev fields are the byte counters,
while the pcnt_fwd and pcnt_rev fields are the packet counters.
Fix the comments that were swapped around.

Signed-off-by: Damjan Jovanovic <[email protected]>
Reviewed by: jlduran,imp
Pull Request: freebsd/freebsd-src#1517
Previously, VMXON would be executed on a resume, contrary to proper
initalization. The contents of MSR_IA32_FEATURE_CONTROL may be lost on
suspension, therefore must be restored. Likewise, the VMX Enable bit may be
cleared upon suspend, requiring it to be re-set.

Concretely disable VMX on suspend, and re-enable it on resume.

Note: any IOMMU context will remain lost for any enabled vmm devices.

Signed-off-by: Joshua Rogers <[email protected]>
Reviewed by: jhb,imp
Pull Request: freebsd/freebsd-src#1419
+ increase history consistency by adding "first appeared"
+ remove a skipped new paragraph macro to quiet linter
+ tag spdx

MFC after:	3 days
History source:	www.in-ulm.de/~mascheck/various/ash/#bsd

Reviewed by: mhorne,imp
Pull Request: freebsd/freebsd-src#1440
Preserve over 40 years of "call UNIX" BSD heritage
while answering "one line about what it does" e.g.

"how do I get a serial console?"
"% apropos serial"

MFC after:	3 days
Reported by:	imp

Reviewed by: imp
Pull Request: freebsd/freebsd-src#1423
While trying to resolve some custom installer issues, we found that
using conscontrol(8) or setting kern.always_console_output=0 in
sysctrl.conf(5) did not always prevent console output. This is in part
because some areas of the kernel were outputting to the console device
without checking the status of the setting. These changes enforce
checking of the value in both locations where console output occurs from
kernel and init(8) based callouts.

Details on changes:

 - Moves check for mute to earlier in sequence to silence kernel output
   even if EARLY_PRINTF is defined.
 - Modifies call prf_putbuf() and prf_putchar() in subr_prf.c to strip
   TOCONS flag if muting is enabled, to honor the setting at print
   level.

This is a rather simple change, which increases areas where flags to
silence console output are honored.  We have been running this change
since 10/23 in-house without issue.  (Patching prior to 14.0 also
required making cn_mute non-static.)

Signed-off-by: [email protected]
Reviewed by: imp
Pull Request: freebsd/freebsd-src#1407
With the %b format specifier we need enough space to write a uintmax_t
in binary.

Reviewed by: imp
Pull Request: freebsd/freebsd-src#1400
Printing the file name doesn't make sense since mkstemp failing means
that the file wasn't created.

Also add a test case for this.

Co-authored-by: Jose Luis Duran <[email protected]>
Reviewed by: imp,jhb
Pull Request: freebsd/freebsd-src#1383
Jul 28, 2024
	Fixed readcsvrec resize segfault when reading csv records longer
	than 8k. Thanks to Ozan Yigit.
	mktime() added to bsd-features branch. Thanks to Todd Miller.
@bsdjhb bsdjhb merged commit f421b33 into CTSRD-CHERI:dev Oct 30, 2025
29 checks passed
@bsdjhb bsdjhb deleted the merge-freebsd-20241129 branch October 30, 2025 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.