|
| 1 | +################### |
| 2 | +3.2.0 (2026 Feb 09) |
| 3 | +################### |
| 4 | + |
| 5 | +We are excited to announce the XGBoost 3.2 release. This release features significant |
| 6 | +progress on multi-target tree support with vector leaf, enhanced GPU external memory |
| 7 | +training, various optimizations, and the removal of the deprecated CLI. |
| 8 | + |
| 9 | +*************** |
| 10 | +External Memory |
| 11 | +*************** |
| 12 | + |
| 13 | +The latest XGBoost release features enhanced support for external memory training with |
| 14 | +GPUs. XGBoost has experimental support for using the CUDA async memory pool, which users |
| 15 | +can opt in to enable asynchronous memory management for efficient external memory |
| 16 | +training. Prior to 3.2, the RMM plugin was required. The feature is Linux-only at the |
| 17 | +moment. (:pr:`11706`, :pr:`11715`, :pr:`11718`, :pr:`11931`, :pr:`11865`, :pr:`11959`, |
| 18 | +:pr:`11962`) |
| 19 | + |
| 20 | +The adaptive cache is now used for all device types, including devices with full C2C |
| 21 | +bandwidth, like GH200 and DGX station. Users can continue to specify the |
| 22 | +``cache_host_ratio`` parameter in case of memory fragmentation. XGBoost now supports |
| 23 | +devices with mixed GPU models for configuring the host cache (:pr:`11998`). As part of the |
| 24 | +work for improved NUMA system support, we co-developed the ``pyhwloc`` project |
| 25 | +(:pr:`11992`). |
| 26 | + |
| 27 | +Lastly, the old page-concat option for GPU external memory has been removed. XGBoost will |
| 28 | +use the full dataset for training. (:pr:`11882`, :pr:`11897`) |
| 29 | + |
| 30 | +****************** |
| 31 | +Multi-Target/Class |
| 32 | +****************** |
| 33 | + |
| 34 | +This release brings substantial progress on the vector-leaf-based multi-target tree model, |
| 35 | +building on the multi-target intercept work from 3.1. The vector leaf tree stores a vector |
| 36 | +of weights in each leaf node, enabling the model to capture correlations across targets |
| 37 | +during tree construction. In 3.2, we expanded the feature set to cover most of the |
| 38 | +commonly used training configurations. |
| 39 | + |
| 40 | +.. warning:: |
| 41 | + |
| 42 | + The vector leaf is still a work in progress. Feedback is welcome. |
| 43 | + |
| 44 | +New features for the multi-target tree include: |
| 45 | + |
| 46 | +- Reduced gradient (sketch boost) for the hist tree method, which avoids using the full |
| 47 | + gradient matrix to find tree structures for improving scalability with the number of |
| 48 | + targets. Users can use a custom objective to define the tree split gradient in addition |
| 49 | + to the full leaf gradient. Built-in objectives are not yet supported. |
| 50 | +- Support for all regression objectives, including MAE and the quantile loss. |
| 51 | +- GPU ``hist`` tree method implementation has features on par with the CPU one. |
| 52 | +- Regularization parameters including L1/L2, ``min_split_loss``, and ``max_delta_step``. |
| 53 | +- Row subsampling with both uniform sampling and gradient-based sampling. |
| 54 | +- Column sampling (feature selection), including feature weights. |
| 55 | +- Feature importance variants (gain and coverage). |
| 56 | +- Model dump support for all formats (JSON, text, graphviz). |
| 57 | +- External memory. |
| 58 | + |
| 59 | +In addition, intercept initialization for the multinomial logistic objective now adheres |
| 60 | +to GLM semantics. |
| 61 | + |
| 62 | +Related PRs: :pr:`11950`, :pr:`11914`, :pr:`11913`, :pr:`11965`, :pr:`11941`, :pr:`11967`, |
| 63 | +:pr:`11940`, :pr:`11896`, :pr:`11894`, :pr:`11889`, :pr:`11917`, :pr:`11883`, :pr:`11786`, |
| 64 | +:pr:`11881`, :pr:`11862`, :pr:`11855`, :pr:`11829`, :pr:`11825`, :pr:`11820`, :pr:`11814`, |
| 65 | +:pr:`11729`, :pr:`11724`, :pr:`11747`, :pr:`11798`, :pr:`11791`, :pr:`11789`, :pr:`11781`, |
| 66 | +:pr:`11778`, :pr:`11777`, :pr:`11744`, :pr:`11922`, :pr:`11920` |
| 67 | + |
| 68 | +Currently missing features for the ``hist`` tree method with vector leaf: |
| 69 | + |
| 70 | +- Distributed training |
| 71 | +- Categorical features |
| 72 | +- Feature interaction constraints |
| 73 | +- Monotone constraints, which are not defined when the output is a vector. |
| 74 | +- Shapley values |
| 75 | + |
| 76 | +******** |
| 77 | +Features |
| 78 | +******** |
| 79 | + |
| 80 | +- As part of the vector leaf work, CPU ```hist`` now supports gradient-based sampling. |
| 81 | +- The deprecated CLI (command line interface) has been removed. It was deprecated in |
| 82 | + 2.1. (:pr:`11720`) |
| 83 | +- Expose the categories container to the C API, allowing C users to access category |
| 84 | + information from the trained model. (:pr:`11794`) |
| 85 | +- Upgrade to CUDA 12.9. (:pr:`11972`, :pr:`11968`) |
| 86 | +- Support oneapi 2026 release. (:pr:`11994`) |
| 87 | +- Compatibility fixes for the latest versions of nvcomp, RMM, and CCCL. (:pr:`11930`, |
| 88 | + :pr:`11834`, :pr:`11871`, :pr:`11995`, :pr:`11861`, :pr:`11785`, :pr:`11997`). A nightly |
| 89 | + CI pipeline was added to test XGBoost with the latest versions of CCCL and |
| 90 | + RMM. (:pr:`11863`) |
| 91 | + |
| 92 | +************* |
| 93 | +Optimizations |
| 94 | +************* |
| 95 | + |
| 96 | +- Various optimizations for the GPU hist tree method, some of which were done as part of |
| 97 | + the vector leaf work. (:pr:`11895`) |
| 98 | +- Enable multi-threaded data initialization for CPU. (:pr:`11974`) |
| 99 | +- Make the ``block_size`` of the CPU histogram building kernel adaptive based on model |
| 100 | + parameters and CPU cache size, demonstrating up to 2x speedup for certain |
| 101 | + workloads. (:pr:`11808`) |
| 102 | +- Small optimizations for some GPU kernels to use TMA. (:pr:`11841`, :pr:`11802`) |
| 103 | +- We now use device memory for storing the tree model, which eliminates data copies |
| 104 | + between host and device during training and inference. (:pr:`11759`, :pr:`11735`, :pr:`11750`, :pr:`11741`, |
| 105 | + :pr:`11752`) |
| 106 | + |
| 107 | +***** |
| 108 | +Fixes |
| 109 | +***** |
| 110 | + |
| 111 | +- Fix logistic regression with constant labels. (:pr:`11973`) |
| 112 | +- Fix OpenMP configuration for macOS. (:pr:`11976`) |
| 113 | +- Fix SYCL build. (:pr:`11844`) |
| 114 | + |
| 115 | +************** |
| 116 | +Python Package |
| 117 | +************** |
| 118 | + |
| 119 | +- Fix memory leak with Python DataFrame inputs where temporary buffers were stored as |
| 120 | + class variables instead of instance variables. (:pr:`11961`) |
| 121 | +- Pandas 3.0 support. (:pr:`11975`) |
| 122 | +- Add Python type hints for tests and demos, various type hint fixes. (:pr:`11795`, :pr:`11797`) |
| 123 | +- Add Python 3.14 classifier. (:pr:`11793`) |
| 124 | +- Maintenance (:pr:`11717`, :pr:`11783`) |
| 125 | + |
| 126 | +********* |
| 127 | +R Package |
| 128 | +********* |
| 129 | + |
| 130 | +- Fix RCHK warnings and memory safety issues. (:pr:`11938`, :pr:`11935`, :pr:`11847`) |
| 131 | +- Error out on factors passed to ``DMatrix`` with an informative message. (:pr:`11810`) |
| 132 | +- Remove calls to R's global RNG that are no longer needed. (:pr:`11848`, :pr:`11887`) |
| 133 | +- Various documentation fixes and updates. (:pr:`11773`, :pr:`11890`, :pr:`11732`, :pr:`11846`, :pr:`11981`, :pr:`11842`) |
| 134 | + |
| 135 | +************ |
| 136 | +JVM Packages |
| 137 | +************ |
| 138 | + |
| 139 | +- Remove ``synchronized`` from predict, as internal prediction is already thread-safe, |
| 140 | + with a concurrency test added to verify. (:pr:`11746`) |
| 141 | +- Set GPU device ID explicitly at the beginning of training and avoid CUDA API guard for |
| 142 | + the tracker process, allowing Spark executors to run in exclusive mode. (:pr:`11939`, :pr:`11929`) |
| 143 | +- Use ``inferBatchSizeParameter`` instead of a hardcoded value. (:pr:`11745`) |
| 144 | +- Documentation updates, maintenance. (:pr:`11691`, :pr:`11915`, :pr:`11743`) |
| 145 | + |
| 146 | +********* |
| 147 | +Documents |
| 148 | +********* |
| 149 | + |
| 150 | +- Update references from XGBoost Operator to Kubeflow Trainer. (:pr:`11710`) |
| 151 | +- Document the categories container and add notes for handling unseen categories. (:pr:`11788`, :pr:`11868`, :pr:`11774`) |
| 152 | +- Add Intel as sponsor. (:pr:`11850`) |
| 153 | + |
| 154 | +****************** |
| 155 | +CI and Maintenance |
| 156 | +****************** |
| 157 | + |
| 158 | +- Support ``pre-commit`` for various linting and formatting tasks. ``clang-format`` is now |
| 159 | + required by the CI. (:pr:`11984`, :pr:`11978`, :pr:`11980`, :pr:`11958`, :pr:`11953`, :pr:`11946`, :pr:`11993`) |
| 160 | +- We added sccache integration to XGBoost's CI workflows, which brings significant |
| 161 | + speedup since a majority of the time is spent on compiling variants of XGBoost. In addition, |
| 162 | + most of the workflows now use GHA container support. (:pr:`11956`, :pr:`11952`, :pr:`11949`, :pr:`11937`, |
| 163 | + :pr:`11934`, :pr:`11927`, :pr:`11932`, :pr:`11924`, :pr:`11979`) |
| 164 | +- Plenty of optimizations for tests. (:pr:`11990`, :pr:`11975`, :pr:`11964`) |
| 165 | +- Various dependency updates, fixes, test refactoring, and cleanups. (:pr:`11955`, :pr:`11957`, |
| 166 | + :pr:`11963`, :pr:`11945`, :pr:`11912`, :pr:`11909`, :pr:`11888`, :pr:`11898`, :pr:`11925`, :pr:`11877`, :pr:`11824`, :pr:`11748`, :pr:`11721`, |
| 167 | + :pr:`11705`, :pr:`11699`, :pr:`11832`, :pr:`11796`, :pr:`11828`, :pr:`11852`, :pr:`11800`, :pr:`11999`, :pr:`11991`) |
0 commit comments