Skip to content

Commit ec5922e

Browse files
authored
[doc] 3.2 release news. (#11996)
1 parent fbba065 commit ec5922e

File tree

2 files changed

+169
-1
lines changed

2 files changed

+169
-1
lines changed

doc/changes/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ For release notes prior to the 2.1 release, please see `news <https://github.com
88
:maxdepth: 1
99
:caption: Contents:
1010

11+
v3.2.0
1112
v3.1.0
1213
v3.0.0
13-
v2.1.0
14+
v2.1.0

doc/changes/v3.2.0.rst

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
###################
2+
3.2.0 (2026 Feb 09)
3+
###################
4+
5+
We are excited to announce the XGBoost 3.2 release. This release features significant
6+
progress on multi-target tree support with vector leaf, enhanced GPU external memory
7+
training, various optimizations, and the removal of the deprecated CLI.
8+
9+
***************
10+
External Memory
11+
***************
12+
13+
The latest XGBoost release features enhanced support for external memory training with
14+
GPUs. XGBoost has experimental support for using the CUDA async memory pool, which users
15+
can opt in to enable asynchronous memory management for efficient external memory
16+
training. Prior to 3.2, the RMM plugin was required. The feature is Linux-only at the
17+
moment. (:pr:`11706`, :pr:`11715`, :pr:`11718`, :pr:`11931`, :pr:`11865`, :pr:`11959`,
18+
:pr:`11962`)
19+
20+
The adaptive cache is now used for all device types, including devices with full C2C
21+
bandwidth, like GH200 and DGX station. Users can continue to specify the
22+
``cache_host_ratio`` parameter in case of memory fragmentation. XGBoost now supports
23+
devices with mixed GPU models for configuring the host cache (:pr:`11998`). As part of the
24+
work for improved NUMA system support, we co-developed the ``pyhwloc`` project
25+
(:pr:`11992`).
26+
27+
Lastly, the old page-concat option for GPU external memory has been removed. XGBoost will
28+
use the full dataset for training. (:pr:`11882`, :pr:`11897`)
29+
30+
******************
31+
Multi-Target/Class
32+
******************
33+
34+
This release brings substantial progress on the vector-leaf-based multi-target tree model,
35+
building on the multi-target intercept work from 3.1. The vector leaf tree stores a vector
36+
of weights in each leaf node, enabling the model to capture correlations across targets
37+
during tree construction. In 3.2, we expanded the feature set to cover most of the
38+
commonly used training configurations.
39+
40+
.. warning::
41+
42+
The vector leaf is still a work in progress. Feedback is welcome.
43+
44+
New features for the multi-target tree include:
45+
46+
- Reduced gradient (sketch boost) for the hist tree method, which avoids using the full
47+
gradient matrix to find tree structures for improving scalability with the number of
48+
targets. Users can use a custom objective to define the tree split gradient in addition
49+
to the full leaf gradient. Built-in objectives are not yet supported.
50+
- Support for all regression objectives, including MAE and the quantile loss.
51+
- GPU ``hist`` tree method implementation has features on par with the CPU one.
52+
- Regularization parameters including L1/L2, ``min_split_loss``, and ``max_delta_step``.
53+
- Row subsampling with both uniform sampling and gradient-based sampling.
54+
- Column sampling (feature selection), including feature weights.
55+
- Feature importance variants (gain and coverage).
56+
- Model dump support for all formats (JSON, text, graphviz).
57+
- External memory.
58+
59+
In addition, intercept initialization for the multinomial logistic objective now adheres
60+
to GLM semantics.
61+
62+
Related PRs: :pr:`11950`, :pr:`11914`, :pr:`11913`, :pr:`11965`, :pr:`11941`, :pr:`11967`,
63+
:pr:`11940`, :pr:`11896`, :pr:`11894`, :pr:`11889`, :pr:`11917`, :pr:`11883`, :pr:`11786`,
64+
:pr:`11881`, :pr:`11862`, :pr:`11855`, :pr:`11829`, :pr:`11825`, :pr:`11820`, :pr:`11814`,
65+
:pr:`11729`, :pr:`11724`, :pr:`11747`, :pr:`11798`, :pr:`11791`, :pr:`11789`, :pr:`11781`,
66+
:pr:`11778`, :pr:`11777`, :pr:`11744`, :pr:`11922`, :pr:`11920`
67+
68+
Currently missing features for the ``hist`` tree method with vector leaf:
69+
70+
- Distributed training
71+
- Categorical features
72+
- Feature interaction constraints
73+
- Monotone constraints, which are not defined when the output is a vector.
74+
- Shapley values
75+
76+
********
77+
Features
78+
********
79+
80+
- As part of the vector leaf work, CPU ```hist`` now supports gradient-based sampling.
81+
- The deprecated CLI (command line interface) has been removed. It was deprecated in
82+
2.1. (:pr:`11720`)
83+
- Expose the categories container to the C API, allowing C users to access category
84+
information from the trained model. (:pr:`11794`)
85+
- Upgrade to CUDA 12.9. (:pr:`11972`, :pr:`11968`)
86+
- Support oneapi 2026 release. (:pr:`11994`)
87+
- Compatibility fixes for the latest versions of nvcomp, RMM, and CCCL. (:pr:`11930`,
88+
:pr:`11834`, :pr:`11871`, :pr:`11995`, :pr:`11861`, :pr:`11785`, :pr:`11997`). A nightly
89+
CI pipeline was added to test XGBoost with the latest versions of CCCL and
90+
RMM. (:pr:`11863`)
91+
92+
*************
93+
Optimizations
94+
*************
95+
96+
- Various optimizations for the GPU hist tree method, some of which were done as part of
97+
the vector leaf work. (:pr:`11895`)
98+
- Enable multi-threaded data initialization for CPU. (:pr:`11974`)
99+
- Make the ``block_size`` of the CPU histogram building kernel adaptive based on model
100+
parameters and CPU cache size, demonstrating up to 2x speedup for certain
101+
workloads. (:pr:`11808`)
102+
- Small optimizations for some GPU kernels to use TMA. (:pr:`11841`, :pr:`11802`)
103+
- We now use device memory for storing the tree model, which eliminates data copies
104+
between host and device during training and inference. (:pr:`11759`, :pr:`11735`, :pr:`11750`, :pr:`11741`,
105+
:pr:`11752`)
106+
107+
*****
108+
Fixes
109+
*****
110+
111+
- Fix logistic regression with constant labels. (:pr:`11973`)
112+
- Fix OpenMP configuration for macOS. (:pr:`11976`)
113+
- Fix SYCL build. (:pr:`11844`)
114+
115+
**************
116+
Python Package
117+
**************
118+
119+
- Fix memory leak with Python DataFrame inputs where temporary buffers were stored as
120+
class variables instead of instance variables. (:pr:`11961`)
121+
- Pandas 3.0 support. (:pr:`11975`)
122+
- Add Python type hints for tests and demos, various type hint fixes. (:pr:`11795`, :pr:`11797`)
123+
- Add Python 3.14 classifier. (:pr:`11793`)
124+
- Maintenance (:pr:`11717`, :pr:`11783`)
125+
126+
*********
127+
R Package
128+
*********
129+
130+
- Fix RCHK warnings and memory safety issues. (:pr:`11938`, :pr:`11935`, :pr:`11847`)
131+
- Error out on factors passed to ``DMatrix`` with an informative message. (:pr:`11810`)
132+
- Remove calls to R's global RNG that are no longer needed. (:pr:`11848`, :pr:`11887`)
133+
- Various documentation fixes and updates. (:pr:`11773`, :pr:`11890`, :pr:`11732`, :pr:`11846`, :pr:`11981`, :pr:`11842`)
134+
135+
************
136+
JVM Packages
137+
************
138+
139+
- Remove ``synchronized`` from predict, as internal prediction is already thread-safe,
140+
with a concurrency test added to verify. (:pr:`11746`)
141+
- Set GPU device ID explicitly at the beginning of training and avoid CUDA API guard for
142+
the tracker process, allowing Spark executors to run in exclusive mode. (:pr:`11939`, :pr:`11929`)
143+
- Use ``inferBatchSizeParameter`` instead of a hardcoded value. (:pr:`11745`)
144+
- Documentation updates, maintenance. (:pr:`11691`, :pr:`11915`, :pr:`11743`)
145+
146+
*********
147+
Documents
148+
*********
149+
150+
- Update references from XGBoost Operator to Kubeflow Trainer. (:pr:`11710`)
151+
- Document the categories container and add notes for handling unseen categories. (:pr:`11788`, :pr:`11868`, :pr:`11774`)
152+
- Add Intel as sponsor. (:pr:`11850`)
153+
154+
******************
155+
CI and Maintenance
156+
******************
157+
158+
- Support ``pre-commit`` for various linting and formatting tasks. ``clang-format`` is now
159+
required by the CI. (:pr:`11984`, :pr:`11978`, :pr:`11980`, :pr:`11958`, :pr:`11953`, :pr:`11946`, :pr:`11993`)
160+
- We added sccache integration to XGBoost's CI workflows, which brings significant
161+
speedup since a majority of the time is spent on compiling variants of XGBoost. In addition,
162+
most of the workflows now use GHA container support. (:pr:`11956`, :pr:`11952`, :pr:`11949`, :pr:`11937`,
163+
:pr:`11934`, :pr:`11927`, :pr:`11932`, :pr:`11924`, :pr:`11979`)
164+
- Plenty of optimizations for tests. (:pr:`11990`, :pr:`11975`, :pr:`11964`)
165+
- Various dependency updates, fixes, test refactoring, and cleanups. (:pr:`11955`, :pr:`11957`,
166+
:pr:`11963`, :pr:`11945`, :pr:`11912`, :pr:`11909`, :pr:`11888`, :pr:`11898`, :pr:`11925`, :pr:`11877`, :pr:`11824`, :pr:`11748`, :pr:`11721`,
167+
:pr:`11705`, :pr:`11699`, :pr:`11832`, :pr:`11796`, :pr:`11828`, :pr:`11852`, :pr:`11800`, :pr:`11999`, :pr:`11991`)

0 commit comments

Comments
 (0)