Releases: rapidsai/raft
Releases · rapidsai/raft
v25.06.00
🚨 Breaking Changes
🐛 Bug Fixes
- NCCL comm resource fix (#2692) @viclafargue
- Fix the launch bounds for nn-descent kernel for 1210 and remove nn-descent tests (#2691) @viclafargue
- Prefer host gather when dataset is available both on host and device (#2671) @tfeher
- Fix warnings treated as errors downstream in cuVS (#2644) @achirkin
- Fix nccl_comm.hpp warning: #83-D: type qualifier specified more than once (#2643) @achirkin
- NVTX: null destination pointer warning-treated-as-error (#2639) @achirkin
- Add UCXX and NCCL to
libraft
conda recipe (#2636) @divyegala - Fix building cutlass (#2619) @miscco
- Fix COO symmetrization (#2582) @viclafargue
🚀 New Features
- [Feat] add
cudaMemcpy2DAsync
wrapper (#2674) @rhdong - Python wrapper for
device_resources_snmg
(#2666) @jinsolp - Laplacian normalization primitives (#2648) @aamijar
- [FEA] Matrix shift rows and columns (#2634) @jinsolp
- Use NCCL wheels from PyPI for CUDA 12 builds (#2629) @divyegala
- Support strided matrix view as an input to matrix::samples_rows (#2626) @enp1s0
- [Feat] add support for bm25 and tfidf (#2567) @jperez999
🛠️ Improvements
- use 'rapids-init-pip' in wheel CI, other CI changes (#2677) @jameslamb
- Dask 2025.4.1 compatibility (#2673) @TomAugspurger
- Finish CUDA 12.9 migration and use branch-25.06 workflows (#2669) @bdice
- Update to clang 20 (#2665) @bdice
- Quote head_rev in conda recipes (#2660) @bdice
- CUDA 12.9 use updated compression flags (#2657) @robertmaynard
- Build and test with CUDA 12.9.0 (#2655) @bdice
- Exclude librmm.so from auditwheel (#2654) @bdice
- Fix cub include in normalize.cuh (#2652) @lowener
- Add support for Python 3.13 (#2649) @gforsyth
- Decoupling multi gpu resources from nccl usage (#2647) @jinsolp
- [BUGFIX] Fixed quoting in wheel paths in pylibraft and raft_dask wheel tests (#2645) @VenkateshJaya
- Download build artifacts from Github for CI (#2640) @VenkateshJaya
- Limit allowed wheel sizes (#2638) @divyegala
- Remove CUDA whole compilation ODR violations (#2633) @divyegala
- refactor(rattler): enable strict channel priority for builds (#2632) @gforsyth
- Vendor RAPIDS.cmake (#2631) @bdice
- Replace
Thrust
iterator facilities and replace them withlibcu++
ones (#2627) @miscco - Port all conda recipes to
rattler-build
(#2623) @gforsyth - Add missing thrust include (#2618) @miscco
- Moving wheel builds to specified location and uploading build artifacts to Github (#2617) @VenkateshJaya
- Fixed pytest marker warnings by removing unused pytest.ini (#2591) @TomAugspurger
- Introduction of the
raft::device_resources_snmg
type (#2549) @viclafargue - Create a NCCL sub-communicator using ncclCommSplit (#2495) @seunghwak
[NIGHTLY] v25.08.00
🔗 Links
🚨 Breaking Changes
- Remove CUDA 11 from dependencies.yaml (#2695) @KyleFromNVIDIA
- stop uploading packages to downloads.rapids.ai (#2688) @jameslamb
- Reduce instantiations of
Reduction
kernels (#2679) @divyegala
🐛 Bug Fixes
- [REVIEW] Fix a few memory leaks. (#2710) @legrosbuffle
- Missed update accounting for reduction related APIs (#2704) @divyegala
- Work around Cython ctypedef bug (#2686) @vyasr
🛠️ Improvements
- refactor(shellcheck): fix all remaining warnings/errors (#2703) @gforsyth
- Remove pytest pin (#2699) @vyasr
- Fix several issues that breaks LLVM (#2698) @vitor1001
- Remove CUDA 11 from dependencies.yaml (#2695) @KyleFromNVIDIA
- Remove CUDA 11 devcontainers and update CI scripts (#2690) @bdice
- refactor(rattler): remove cuda11 options and general cleanup (#2689) @gforsyth
- stop uploading packages to downloads.rapids.ai (#2688) @jameslamb
- fix(devcontainers): typo in container name (#2687) @gforsyth
- Reduce instantiations of
Reduction
kernels (#2679) @divyegala - Forward-merge branch-25.06 into branch-25.08 (#2675) @divyegala
- Forward-merge branch-25.06 into branch-25.08 (#2664) @gforsyth
- Lanczos Solver
which=SA,SM,LA,LM
argument (#2628) @aamijar
v25.04.00
🚨 Breaking Changes
- Account for cugraph API breakage (#2581) @divyegala
- Use new rapids-logger library (#2566) @vyasr
🐛 Bug Fixes
- Backport build patch fix (#2620) @KyleFromNVIDIA
- Revert "Temporarily increase
max_days_without_success
(#2602)" (#2613) @divyegala - Relax max duplicates in batched NN Descent (#2610) @jinsolp
- [Fix] Lanczos solver gemv fix (#2607) @aamijar
- [Fix]
select-k-csr
failure on CUDA11.x + H100 (#2604) @rhdong - Temporarily increase
max_days_without_success
(#2602) @divyegala - Swap
blocks
andthreads_per_block
incompute_graph_laplacian
(#2597) @jcrist - [BUG] Fix illegal memory access in linalg::reduction (#2592) @enp1s0
- Require sphinx<8.2.0 (#2590) @KyleFromNVIDIA
- Account for cugraph API breakage (#2581) @divyegala
#include <numeric>
forstd::iota
(#2578) @benfred- Fix Laplacian calculation in spectral partitioning (#2568) @wphicks
- Take argument by
const&
as the input range is const (#2558) @miscco - Allow some of the sparse utility functions to handle larger matrices (#2541) @viclafargue
🛠️ Improvements
- ci: pre-filter 11.4 jobs before they are enabled in shared workflows (#2608) @gforsyth
- Use conda-build instead of conda-mambabuild (#2595) @bdice
- Replace
cub::Sum
andcub::Max
withcuda::std::plus
andcuda::maximum
(#2594) @miscco - Update all
conda_build_config.yaml
s RAPIDS UCX version (#2589) @jakirkham - Drop
cub::TransformInputIterator
in favor ofthrust::transform_iterator
(#2588) @miscco - Consolidate more Conda solves in CI (#2587) @KyleFromNVIDIA
- Fix duplicate indices in batch NN Descent (#2586) @jinsolp
- Require CMake 3.30.4 (#2584) @robertmaynard
- Create Conda CI test env in one step (#2580) @KyleFromNVIDIA
- Use shared-workflows branch-25.04 (#2576) @bdice
- Add
shellcheck
to pre-commit and fix warnings (#2575) @gforsyth - Add build_type input field for
test.yaml
(#2573) @gforsyth - Use
rapids-pip-retry
in CI jobs that might need retries (#2571) @gforsyth - Avoid limited memory adaptor issue in balanced KMeans (#2570) @csadorf
- update telemetry and retarget 25.04 (#2569) @msarahan
- Use new rapids-logger library (#2566) @vyasr
- disallow fallback to Make in Python builds (#2563) @jameslamb
- Forward-merge branch-25.02 into branch-25.04 (#2561) @bdice
- Migrate to NVKS for amd64 CI runners (#2559) @bdice
- Add
verify-codeowners
hook (#2557) @KyleFromNVIDIA
v25.02.00
🚨 Breaking Changes
- Update pip devcontainers to UCX 1.18 (#2550) @jameslamb
- Switch over to rapids-logger (#2530) @vyasr
- Adapt to rmm logger changes (#2513) @vyasr
🐛 Bug Fixes
- Rename test to tests. (#2546) @bdice
- Fix bit order of RMAT Rectangular Generator to match expectation (#2542) @mfoerste4
- Fix broken link to python doc (#2537) @lowener
- Fix lanczos solver integer overflow (#2536) @viclafargue
- Fix rnd bit generation in rmat_rectangular_kernel (#2524) @tfeher
📖 Documentation
🚀 New Features
- Add cuda 12.8 support (#2551) @robertmaynard
- Add support for different data type of bitset (#2535) @lowener
- [Feat] Support
bitset_to_csr
(#2523) @rhdong - Remove upper bounds on cuda-python to allow 12.6.2 and 11.8.5 (#2517) @bdice
🛠️ Improvements
- Revert CUDA 12.8 shared workflow branch changes (#2560) @vyasr
- Build and test with CUDA 12.8.0 (#2555) @bdice
- Update pip devcontainers to UCX 1.18 (#2550) @jameslamb
- use dynamic CUDA wheels on CUDA 11 (#2548) @jameslamb
- Normalize whitespace (#2547) @bdice
- Use cuda.bindings layout. (#2545) @bdice
- Revert "Introduction of the
raft::device_resources_snmg
type (#2487)" (#2543) @cjnolet - Add missing
#include <cstdint>
(#2540) @jakirkham - Use GCC 13 in CUDA 12 conda builds. (#2539) @bdice
- Use rapids-cmake for the logger (#2534) @vyasr
- Check if nightlies have succeeded recently enough (#2533) @vyasr
- remove unused 'joblib' and 'numba' dependencies, other packaging cleanup (#2532) @jameslamb
- introduce libraft wheels (#2531) @jameslamb
- Switch over to rapids-logger (#2530) @vyasr
- reduce duplication, removed unused things in dependencies.yaml (#2529) @jameslamb
- Update cuda-python lower bounds to 12.6.2 / 11.8.5 (#2522) @bdice
- [Opt] Optimizing the performance of
bitmap_to_csr
(#2516) @rhdong - prefer system install of UCX in devcontainers, update outdated RAPIDS references (#2514) @jameslamb
- Adapt to rmm logger changes (#2513) @vyasr
- Require approval to run CI on draft PRs (#2512) @bdice
- Shrink wheel size limit following removal of vector search APIs. (#2509) @bdice
- Forward-merge branch-24.12 to branch-25.02 (#2508) @bdice
- Introduction of the
raft::device_resources_snmg
type (#2487) @viclafargue - Add breaking change workflow trigger (#2482) @AyodeAwe
- Remove 'sample' parameter from stats::mean API (#2389) @mfoerste4
v24.12.00
🚨 Breaking Changes
🐛 Bug Fixes
- Skip gtests for new lanczos solver when CUDA version is 11.4 or below. (#2520) @cjnolet
- Switch
assert
tostatic_assert
(#2510) @divyegala - Revert use of new Lanczos solver in spectral clustering (#2507) @lowener
- Put a ceiling on cuda-python (#2486) @bdice
- Don't presume pointers location infers usability. (#2480) @robertmaynard
- Use Python for sccache hit rate computation. (#2474) @bdice
- Allow compilation with CUDA 12.6.1 (#2469) @robertmaynard
🚀 New Features
🛠️ Improvements
- Skip gtests for Rmat Lanczos tests with cuda <= 11.4 (#2525) @benfred
- Upgrade to latest cutlass version (#2503) @vyasr
- Removing some left over places where implicit instantiations were being ignored in headers (#2501) @cjnolet
- Remove leftover template project code. (#2500) @bdice
- 2412 remove libraft vss instantiations (#2498) @cjnolet
- Remove raft-ann-bench (#2497) @cjnolet
- Pin FAISS Version for raft-ann-bench (#2496) @tarang-jain
- enforce wheel size limits and README formatting in CI, put a ceiling on Cython dependency (#2490) @jameslamb
- Do not initialize the pinned mdarray at construction time (#2478) @achirkin
- Use environment variables in cache hit rate computation. (#2475) @bdice
- devcontainer: replace
VAULT_HOST
withAWS_ROLE_ARN
(#2472) @jjacobelli - print sccache stats in builds (#2470) @jameslamb
- make package installations in CI stricter (#2467) @jameslamb
- Prune workflows based on changed files (#2466) @KyleFromNVIDIA
- Merge branch-24.10 into branch-24.12 (#2461) @jameslamb
- Update all rmm imports to use pylibrmm/librmm (#2451) @Matt711
v24.10.00
🚨 Breaking Changes
🐛 Bug Fixes
- Disable NN Descent Batch tests temporarily (#2453) @divyegala
- Fix sed syntax in
update-version.sh
(#2441) @raydouglass - Use runtime check of cudart version for eig (#2430) @lowener
- [BUG] Fix bitset function visibility (#2429) @lowener
- Exclude any kernel symbol that uses cutlass (#2425) @robertmaynard
🚀 New Features
- [Feat] add
repeat
,sparsity
,eval_n_elements
APIs tobitset
(#2439) @rhdong - [Opt] Enforce the UT Coverity and add benchmark for
transpose
(#2438) @rhdong - [FEA] Support for half-float mixed precise in brute-force (#2382) @rhdong
🛠️ Improvements
- bump NCCL floor to 2.19 (#2458) @jameslamb
- Deprecating vector search APIs and updating README accordingly (#2448) @cjnolet
- Update update-version.sh to use packaging lib (#2447) @AyodeAwe
- Switch traceback to
native
(#2446) @galipremsagar - bump NCCL floor to 2.18.1.1 (#2443) @jameslamb
- Add missing
cuda_suffixed: true
(#2440) @trxcllnt - Use CI workflow branch 'branch-24.10' again (#2437) @jameslamb
- Update to flake8 7.1.1. (#2435) @bdice
- Update fmt (to 11.0.2) and spdlog (to 1.14.1). (#2433) @jameslamb
- Allow coo_sort to work on int64_t indices (#2432) @benfred
- Adding NCCL clique to the RAFT handle (#2431) @viclafargue
- Add support for Python 3.12 (#2428) @jameslamb
- Update rapidsai/pre-commit-hooks (#2420) @KyleFromNVIDIA
- Drop Python 3.9 support (#2417) @jameslamb
- Use CUDA math wheels (#2415) @KyleFromNVIDIA
- Remove NumPy <2 pin (#2414) @seberg
- Update pre-commit hooks (#2409) @KyleFromNVIDIA
- Improve update-version.sh (#2408) @bdice
- Use tool.scikit-build.cmake.version, set scikit-build-core minimum-version (#2406) @jameslamb
- [FEA] Batching NN Descent (#2403) @jinsolp
- Update pip devcontainers to UCX v1.17.0 (#2401) @jameslamb
- Merge branch-24.08 into branch-24.10 (#2397) @jameslamb
v24.08.00
🚨 Breaking Changes
- [Refactor] move
popc
to under util (#2394) @rhdong - [Opt] Expose the
detail::popc
as public API (#2346) @rhdong
🐛 Bug Fixes
- Add timeout to UCXX generic operations (#2398) @pentschev
- [Fix] bitmap set/test issue (#2371) @rhdong
- Fix 0 recall issue in
raft_cagra_hnswlib
ANN benchmark (#2369) @divyegala - Fix
ef
setting in HNSW wrapper (#2367) @divyegala - Fix cagra graph opt bug (#2365) @enp1s0
- Fix a bug where the wrong API is used to free the memory (#2361) @PointKernel
- Allow anonymous user in devcontainer name (#2355) @bdice
- Fix compilation error when _CLK_BREAKDOWN is defined in cagra. (#2350) @jiangyinzuo
- ensure raft-dask wheel tests install pylibraft wheel from the same CI run, fix wheel dependencies (#2349) @jameslamb
- Change --config-setting to --config-settings (#2342) @KyleFromNVIDIA
- Add workaround for syevd in CUDA 12.0 (#2332) @lowener
🚀 New Features
- [FEA] add the support of
masked_matmul
(#2362) @rhdong - [FEA] Dice Distance for Dense Inputs (#2359) @aamijar
- [Opt] Expose the
detail::popc
as public API (#2346) @rhdong - Enable distance return for NN Descent (#2345) @jinsolp
🛠️ Improvements
- [Refactor] move
popc
to under util (#2394) @rhdong - split up CUDA-suffixed dependencies in dependencies.yaml (#2388) @jameslamb
- Use workflow branch 24.08 again (#2385) @KyleFromNVIDIA
- Add cusparseSpMV_preprocess to cusparse wrapper (#2384) @Kh4ster
- Consolidate SUM reductions (#2381) @mfoerste4
- Use slicing kernel to copy distances inside NN Descent (#2380) @jinsolp
- Build and test with CUDA 12.5.1 (#2378) @KyleFromNVIDIA
- Add CUDA_STATIC_MATH_LIBRARIES (#2376) @KyleFromNVIDIA
- skip CMake 3.30.0 (#2375) @jameslamb
- Use verify-alpha-spec hook (#2373) @KyleFromNVIDIA
- Binarize Dice Distance for Dense Inputs (#2370) @aamijar
- [FEA] Add distance epilogue for NN Descent (#2364) @jinsolp
- resolve dependency-file-generator warning, other rapids-build-backend followup (#2360) @jameslamb
- Remove text builds of documentation (#2354) @vyasr
- Use default init in reduction (#2351) @akifcorduk
- ensure update-version.sh preserves alpha spec, add tests on version constants (#2344) @jameslamb
- remove unnecessary 'setuptools' dependencies (#2343) @jameslamb
- Use rapids-build-backend (#2331) @KyleFromNVIDIA
- Add FAISS with RAFT enabled Benchmarking to raft-ann-bench (#2026) @tarang-jain
v24.06.00
🚨 Breaking Changes
- Rename raft-ann-bench module to raft_ann_bench (#2333) @KyleFromNVIDIA
- Scaling workspace resources (#2322) @achirkin
- [REVIEW] Adjust UCX dependencies (#2304) @pentschev
- Convert device_memory_resource* to device_async_resource_ref (#2269) @harrism
🐛 Bug Fixes
- Fix import of VERSION file in raft-ann-bench (#2338) @KyleFromNVIDIA
- Rename raft-ann-bench module to raft_ann_bench (#2333) @KyleFromNVIDIA
- Support building faiss main statically (#2323) @robertmaynard
- Refactor spectral scale_obs to use existing normalization function (#2319) @ChuckHastings
- Correct initializer list order found by cuvs (#2317) @robertmaynard
- ANN_BENCH: enable move semantics for configured_raft_resources (#2311) @achirkin
- Revert "Build C++ wheel (#2264)" (#2305) @vyasr
- Revert "Add
compile-library
by default on pylibraft build" (#2300) @vyasr - Add VERSION to raft-ann-bench package (#2299) @KyleFromNVIDIA
- Remove nonexistent job from workflow (#2298) @vyasr
libucx
should be run dependency ofraft-dask
(#2296) @divyegala- Fix clang intrinsic warning (#2292) @aaronmondal
- Replace too long index file name with hash in ANN bench (#2280) @tfeher
- Fix build command for C++ compilation (#2270) @lowener
- Fix a compilation error in CAGRA when enabling log output (#2262) @enp1s0
- Correct member initialization order (#2254) @robertmaynard
- Fix time computation in CAGRA notebook (#2231) @lowener
📖 Documentation
🚀 New Features
- Scaling workspace resources (#2322) @achirkin
- ANN_BENCH: AnnGPU::uses_stream() for optional algo GPU sync (#2314) @achirkin
- [FEA] Split Bitset code (#2295) @lowener
- [FEA] support of prefiltered brute force (#2294) @rhdong
- Always use a static gtest and gbench (#2265) @robertmaynard
- Build C++ wheel (#2264) @vyasr
- InnerProduct Distance Metric for CAGRA search (#2260) @tarang-jain
- [FEA] Add support for
select_k
on CSR matrix (#2140) @rhdong
🛠️ Improvements
- ANN_BENCH: common AnnBase::index_type (#2315) @achirkin
- ANN_BENCH: split instances of RaftCagra into multiple files (#2313) @achirkin
- ANN_BENCH: a global pool of result buffers across benchmark cases (#2312) @achirkin
- Remove the shared state and the mutex from NVTX internals (#2310) @achirkin
- docs: update README.md (#2308) @eltociear
- [REVIEW] Reenable raft-dask wheel tests requiring UCX-Py (#2307) @pentschev
- [REVIEW] Adjust UCX dependencies (#2304) @pentschev
- Overhaul ops-codeowners (#2303) @raydouglass
- Make thrust nosync execution policy the default thrust policy (#2302) @abc99lr
- InnerProduct testing for CAGRA+HNSW (#2297) @divyegala
- Enable warnings as errors for Python tests (#2288) @mroeschke
- Normalize dataset vectors in the CAGRA InnerProduct tests (#2287) @enp1s0
- Use dynamic version for raft-ann-bench (#2285) @KyleFromNVIDIA
- Make 'librmm' a 'host' dependency for conda packages (#2284) @jameslamb
- Fix comments in cpp/include/raft/neighbors/cagra_serialize.cuh (#2283) @jiangyinzuo
- Only use functions in the limited API (#2282) @vyasr
- define 'ucx' pytest marker (#2281) @jameslamb
- Migrate to
{{ stdlib("c") }}
(#2278) @hcho3 - add --rm and --name to devcontainer run args (#2275) @trxcllnt
- Update pip devcontainers to UCX v1.15.0 (#2274) @trxcllnt
#ifdef
out pragma deprecation warning messages (#2271) @trxcllnt- Convert device_memory_resource* to device_async_resource_ref (#2269) @harrism
- Update the developer's guide with new copyright hook (#2266) @KyleFromNVIDIA
- Improve coalesced reduction performance for tall and thin matrices (up to 2.6x faster) (#2259) @Nyrio
- Adds missing files to
update-version.sh
(#2255) @AyodeAwe - Enable all tests for
arm64
jobs (#2248) @galipremsagar - Update nvtx3 link in cmake (#2246) @lowener
- Add CAGRA-Q subspace dim = 4 support (#2244) @enp1s0
- Get rid of
cuco::sentinel
namespace (#2243) @PointKernel - Replace usages of raw
get_upstream
withget_upstream_resource()
(#2207) @miscco - Set the import mode for dask tests (#2142) @vyasr
- Add UCXX support (#1983) @pentschev
v24.04.00
🐛 Bug Fixes
- Update pre-commit-hooks to v0.0.3 (#2239) @KyleFromNVIDIA
- MAINT: Simplify NCCL worker rank identification (#2228) @VibhuJawa
- Fix bug in blockRankedReduce (#2226) @akifcorduk
- Fix illegal acces mean/stdev, sum add Kahan Summation (#2223) @mfoerste4
- Batch cutlass distance kernels along N matrix dim (#2215) @mdoijade
- Fix out of bounds access in sum kernel (#2183) @tfeher
- Fix ANN bench ground truth generation for k>1024 (#2180) @tfeher
- Fixing cusparse aligned address issue and adding note (#2179) @cjnolet
- Launch
neighborhood_recall
kernel on CUDA stream (#2156) @divyegala - Add
compile-library
by default on pylibraft build (#2090) @lowener
📖 Documentation
🚀 New Features
- Add CAGRA-Q to ANN benchmarks (#2233) @achirkin
- Add CAGRA-Q build (compression) (#2213) @achirkin
- CAGRA-Q search (#2206) @enp1s0
- Demangle backtrace symbols on raft error (#2188) @achirkin
- Reapply: Support for fp16 in CAGRA and IVF-PQ (#2172) @achirkin
- Remove supports_streams from custom RAFT memory resources (#2121) @harrism
- [FEA] Add support for bitmap_view & the API of
bitmap_to_csr
(#2109) @rhdong
🛠️ Improvements
- Use
conda env create --yes
instead of--force
(#2247) @bdice - Align ucx version pinning with ucx-py/ucxx. (#2227) @bdice
- Add upper bound to prevent usage of NumPy 2 (#2222) @bdice
- Performance optimization of IVF-flat / select_k (#2221) @mfoerste4
- Replace local copyright check with pre-commit-hooks verify-copyright (#2220) @KyleFromNVIDIA
- Remove hard-coding of RAPIDS version where possible (#2219) @KyleFromNVIDIA
- Fix style. (#2214) @bdice
- Add explicit instantiations for IVF-PQ search kernels used in tests (#2212) @tfeher
- Improve RBC eps-neighborhood query performance (#2211) @mfoerste4
- Add test for spmm (#2210) @mfoerste4
- Only install necessary components in conda packages. (#2209) @bdice
- Automate C++ include file grouping and ordering using clang-format (#2202) @harrism
- Add support for Python 3.11, require NumPy 1.23+ (#2200) @jameslamb
- Pass
std::optional
instead ofthrust::optional
to RMM (#2199) @trxcllnt - Update devcontainers to CUDA Toolkit 12.2 (#2192) @trxcllnt
- target branch-24.04 for GitHub Actions workflows (#2189) @jameslamb
- Fixing workaround for cuSPARSE bug with correct copy dimensions (#2185) @mfoerste4
- Allow topk larger than 1024 in CAGRA (#2181) @benfred
- IVF-FLAT support k > 256 (#2169) @mfoerste4
- Add environment-agnostic scripts for running ctests and pytests (#2165) @trxcllnt
- Ensure that
ctest
is called with--no-tests=error
. (#2163) @bdice - Update ops-bot.yaml (#2158) @AyodeAwe
- random sampling of dataset rows with improved memory utilization (#2155) @tfeher
- [FIX] Ensure hnswlib can be found from RAFT's build dir (#2145) @trxcllnt
- Improve analysis experience for ANN benchmarks (#2139) @achirkin
- Enable CAGRA index building without adding dataset to the index (#2126) @tfeher
- Add fused cosine 1-NN cutlass based kernel (#2125) @mdoijade
- Update raft for compatibility with the latest cuco (#2118) @PointKernel
- Support CUDA 12.2 (#2092) @jameslamb
- Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency (#1786) @achirkin
v24.02.00
🚨 Breaking Changes
- Switch to scikit-build-core (#2051) @vyasr
- Update to CCCL 2.2.0. (#2049) @bdice
- Update
raft-ann-bench
output filenames and add features to plotting (#2043) @divyegala - Remove selection_faiss (#2027) @benfred
🐛 Bug Fixes
- fix is_row/col_order for strided layouts (#2173) @mfoerste4
- Fix failing C++ tests and revert #2097, #2085. (#2168) @cjnolet
- Exclude tests from builds (#2162) @vyasr
- [HOTFIX] 24.02 Revert Random Sampling (#2144) @cjnolet
- Pin to pytest 7. (#2137) @bdice
- Conditionally include
hnsw
wrapper source in CMake (#2135) @divyegala - [BUG] Fix
SPMM
strided view (#2124) @lowener - Fixing small bug in CUSPARSE spmm w/ CUDA 12.2 (#2117) @cjnolet
- [BUG] Fix
num_cta_per_query
div (#2107) @lowener - Remove extraneous host pinnings from libraft-headers-only. (#2102) @bdice
- Remove unneeded CI symbol excludes (#2098) @robertmaynard
- Properly taking ownership of nccl subcomm (and destroying it) (#2094) @cjnolet
- Fix
max_queries
for CAGRA (#2081) @lowener - Fix compile failure on RTX 4090 (#2076) @JieFengWang
- Fix a crash in FAISS benchmark wrapper introduced in #2021 (#2062) @achirkin
- Correct function that wasn't returning a value (#2045) @robertmaynard
- Fixing small bug in raft-ann-bench (#2041) @cjnolet
- Make device_resources accessed from device_resources_manager thread-safe (#2030) @wphicks
- Fix ann-bench multithreading (#2021) @achirkin
- Fix
ci/checks/copyright.py
to mirror RAPIDS reference (#2008) @divyegala - Fix pyproject versions (#2002) @vyasr
📖 Documentation
- Adding license info for wiki-all dataset (#2129) @cjnolet
- [DOC] Documentation updates for release 24.02 (#2093) @cjnolet
- Fix errors with ingroup exposed by doxygen 1.10 (#2079) @wphicks
- Fix a typo (#2070) @narangvivek10
- Add usage example for brute_force::build (#2029) @benfred
- Add filtering to vector search tutorial (#1996) @lowener
🚀 New Features
- Update to use rapids-cmake for all deps (#2096) @robertmaynard
- Add IVF-PQ example into the template project (#2091) @achirkin
- Support for fp16 in CAGRA and IVF-PQ (#2085) @achirkin
- Add random subsampling for IVF methods (#2077) @tfeher
- Update
raft-ann-bench
output filenames and add features to plotting (#2043) @divyegala - Add brute_force index serialization (#2036) @wphicks
- Add eps-neighbor search via RBC (#2028) @mfoerste4
libraft
andpylibraft
API for CAGRA build and HNSW search (#2022) @divyegala- Export Pareto frontier in
raft-ann-bench.data_export
(#2009) @divyegala - Implement maybe-owning multi-dimensional container (mdbuffer) (#1999) @wphicks
- Add support for 1024+ dim vectors in CAGRA search (#1994) @enp1s0
- Replace GEMM backend: cublas.gemm -> cublaslt.matmul (#1736) @achirkin
🛠️ Improvements
- Remove get_mem_info functions from RAFT custom memory resources (#2108) @harrism
- Replace call to mr::get_mem_info() (#2099) @harrism
- Allow topk larger than 1024 in CAGRA (#2097) @benfred
- Remove usages of rapids-env-update (#2095) @KyleFromNVIDIA
- Provide explicit pool size for pool_memory_resources and clean up includes (#2088) @harrism
- refactor CUDA versions in dependencies.yaml (#2086) @jameslamb
- ANN bench fix latency measurement overhead (#2084) @tfeher
- Remove hardcoded limit in
print_results
function (#2080) @narangvivek10 - [FEA] Add support for SDDMM by wrapping the cusparseSDDMM (#2067) (#2067) @rhdong
- Benchmark brute force knn (#2063) @benfred
- [BUG] fix empty initialization of device_ndarray in pylibraft (#2061) @mfoerste4
- Improve parallelism of refine host (#2059) @anaruse
- Subsampling for IVF-PQ codebook generation (#2052) @abc99lr
- Switch to scikit-build-core (#2051) @vyasr
- Update to CCCL 2.2.0. (#2049) @bdice
- Use cuda::proclaim_return_type on device lambda. (#2048) @bdice
- Removing code that explicitly compares equality of rmm memory resources (#2047) @cjnolet
- Add public enum for select-k algorithm selection (#2046) @benfred
- Update dependencies.yaml to new pip index (#2042) @vyasr
- Remove RAFT_BUILD_WHEELS and standardize Python builds (#2040) @vyasr
- Fix ucx-py version pinning in dependencies.yaml. (#2035) @bdice
- [REVIEW] Fix typos in parameter tuning guide (#2034) @abc99lr
- Add AIR-Top-k reference (#2031) @tfeher
- Remove selection_faiss (#2027) @benfred
- Fixing json parse error in
raft-ann-bench.data_export
(#2025) @cjnolet - Updating cagra build constraint (#2016) @cjnolet
- Update to fmt 10.1.1 and spdlog 1.12.0. (#1957) @bdice
- Enable host dataset for IVF-Flat (#1635) @tfeher
- add half/bfloat support to myInf and abs (#1592) @Kh4ster