Ray-2.47.0
Release Highlights
- Prefill disaggregation is now supported in initial support in Ray Serve LLM (#53092). This is critical for production LLM serving use cases.
- Ray Data features a variety of performance improvements (locality-based scheduling, non-blocking execution) as well as improvements to observability, preprocessors, and other stability fixes.
- Ray Serve now features custom request routing algorithms, which is critical for high throughput traffic for large model use cases.
Ray Libraries
Ray Data
🎉 New Features:
- Add save modes support to file data sinks (#52900)
- Added flattening capability to the Concatenator preprocessor to support output vectorization use cases (#53378)
💫 Enhancements:
- Re-enable Actor locality-based scheduling. This PR also improves algorithms for ranking the locations for the bundle. (#52861)
- Disable blocking pipeline by default until Actor Pool fully scales up to min actors (#52754)
- Progress bar and dashboard improvements to show name of partial functions properly(#52280)
🔨 Fixes:
- Make Ray Data
from_torch
respect Dataset len (#52804) - Fixing flaky aggregation test (#53383)
- Fix race condition bug in fault tolerance by disabling
on_exit
hook (#53249) - Fix
move_tensors_to_device
utility for the list/tuple[tensor] case (#53109) - Fix
ActorPool
scaling to avoid scaling down when the input queue is empty (#53009) - Fix internal queues accounting for all Operators w/ an internal queue (#52806)
- Fix backpressure for
FileBasedDatasource
. This fixes potential OOMs for workloads usingFileBasedDatasources
(#52852)
📖 Documentation:
- Fix working code snippets (#52748)
- Improve AggregateFnV2 docstrings and examples (#52911)
- Improved documentation for vectorizers and API visibility in Data (#52456)
Ray Train
🎉 New Features:
- Added support for configuring Ray Train worker actor runtime environments. (#52421)
- Included Grafana panel data in Ray Train export for improved monitoring. (#53072)
- Introduced a structured logging environment variable to standardize log formats. (#52952)
- Added metrics for
TrainControllerState
to enhance observability. (#52805)
💫 Enhancements:
- Logging of controller state transitions to aid in debugging and analysis. (#53344)
- Improved handling of
Noop
scaling decisions for smoother scaling logic. (#53180)
🔨 Fixes:
- Improved
move_tensors_to_device utility
to correctly handlelist
/tuple
of tensors. (#53109) - Fixed GPU transfer support for non-contiguous tensors. (#52548)
- Increased timeout in
test_torch_device_manager
to reduce flakiness. (#52917)
📖 Documentation:
- Added a note about PyTorch DataLoader’s multiprocessing and forkserver usage. (#52924)
- Fixed various docstring format and indentation issues. (#52855, #52878)
- Removed unused "configuration-overview" documentation page. (#52912)
- General typo corrections. (#53048)
🏗 Architecture refactoring:
- Deduplicated ML doctest runners in CI for efficiency. (#53157)
- Converted isort configuration to Ruff for consistency. (#52869)
- Removed unused
PARALLEL_CI
blocks and combined imports. (#53087, #52742)
Ray Tune
💫 Enhancements:
- Updated
test_train_v2_integration
to use the correctRunConfig
. (#52882)
📖 Documentation:
- Replaced
session.report
withtune.report
and corrected import paths. (#52801) - Removed outdated graphics cards reference in docs. (#52922)
- Fixed various docstring format issues. (#52879)
Ray Serve
🎉 New Features:
- Added support for implementing custom request routing algorithms. (#53251)
- Introduced an environment variable to prioritize custom resources during deployment scheduling. (#51978)
💫 Enhancements:
- The ingress API now accepts a builder function in addition to an ASGI app object. (#52892)
🔨 Fixes:
- Fixed
runtime_env
validation forpy_modules
. (#53186) - Disallowed special characters in Serve deployment and application names. (#52702)
- Added a descriptive error message when a deployment name is not found. (#45181)
📖 Documentation:
- Updated the guide on serving models with Triton Server in Ray Serve.
- Added documentation for custom request routing algorithms.
Ray Serve/Data LLM
🎉 New Features:
- Added initial support for prefill decode disaggregation (#53092)
- Expose vLLM Metrics to
serve.llm
API (#52719) - Embedding API (#52229)
💫 Enhancements:
- Allow setting
name_prefix
inbuild_llm_deployment
(#53316) - Minor bug fix for 53144: stop tokens cannot be null (#53288)
- Add missing
repetition_penalty
vLLM sampling parameter (#53222) - Mitigate the serve.llm streaming overhead by properly batching stream chunks (#52766)
- Fix test_batch_vllm leaking resources by using larger
wait_for_min_actors_s
🔨 Fixes:
LLMRouter.check_health()
should checkLLMServer.check_health()
(#53358)- Fix runtime passthrough and auto-executor class selection (#53253)
- Update
check_health
return type (#53114) - Bug fix for duplication of
<bos>
token (#52853) - In stream batching, first part of the stream was always consumed and not streamed back from the router (#52848)
RLlib
🎉 New Features:
- Add GPU inference to offline evaluation. (#52718)
💫 Enhancements:
- Do-over of examples for connector pipelines. (#52604)
- Cleanup of meta learning classes and examples. (#52680)
🔨 Fixes:
- Fixed weight synching in offline evaluation. (#52757)
- Fixed bug in
split_and_zero_pad
utility function (related to complex structures vs simple values ornp.arrays
). (#52818)
Ray Core
💫 Enhancements:
uv run
integration is now enabled by default, so you don't need to set theRAY_RUNTIME_ENV_HOOK
any more (#53060). If you rely on the previous behavior whereuv run
only runs the Ray driver but not the workers in the uv environment, you can switch back to the old behavior by setting theRAY_ENABLE_UV_RUN_RUNTIME_ENV=0
environment variable.- Record gcs process metrics (#53171)
🔨 Fixes:
- Improvements for using
RuntimeEnv
in the Job Submission API. (#52704) - Close unused pipe file descriptor of child processes of Raylet (#52700)
- Fix race condition when canceling task that hasn't started yet (#52703)
- Implement a thread pool and call the CPython API on all threads within the same concurrency group (#52575)
- cgraph: Fix execution schedules with collective operations (#53007)
- cgraph: Fix scalar tensor serialization edge case with
serialize_to_numpy_or_scalar
(#53160) - Fix the issue where a valid
RestartActor
rpc is ignored (#53330) - Fix reference counter crashes during worker graceful shutdown (#53002)
Dashboard
🎉 New Features:
- train: Add dynolog for on-demand GPU profiling for Torch training (#53191)
💫 Enhancements:
- Add configurability of 'orgId' param for requesting Grafana dashboards (#53236)
🔨 Fixes:
- Fix Grafana dashboards dropdowns for data and train dashboard (#52752)
- Fix dashboard for daylight savings (#52755)
Ray Container Images
💫 Enhancements:
- Upgrade
h11
(#53361),requests
,starlette
,jinja2
(#52951),pyopenssl
andcryptography
(#52941) - Generate multi-arch image indexes (#52816)
Docs
🎉 New Features:
- End-to-end example: Entity recognition with LLMs (#52342) - new end-to-end example
- End-to-end example: xgboost tutorial (#52383)
- End-to-end tutorial for audio transcription and LLM as judge curation (#53189)
💫 Enhancements:
- Adds pydoclint to pre-commit (#52974)
Thanks!
Thank you to everyone who contributed to this release!
@NeilGirdhar, @ok-scale, @JiangJiaWei1103, @brandonscript, @eicherseiji, @ktyxx, @MichalPitr, @GeneDer, @rueian, @khluu, @bveeramani, @ArturNiederfahrenhorst, @c8ef, @lk-chen, @alanwguo, @simonsays1980, @codope, @ArthurBook, @kouroshHakha, @Yicheng-Lu-llll, @jujipotle, @aslonnie, @justinvyu, @machichima, @pcmoritz, @saihaj, @wingkitlee0, @omatthew98, @can-anyscale, @nadongjun, @chris-ray-zhang, @dizer-ti, @matthewdeng, @ryanaoleary, @janimo, @crypdick, @srinathk10, @cszhu, @TimothySeah, @iamjustinhsu, @mimiliaogo, @angelinalg, @gvspraveen, @kevin85421, @jjyao, @elliot-barn, @xingyu-long, @LeoLiao123, @thomasdesr, @ishaan-mehta, @noemotiovon, @hipudding, @davidxia, @omahs, @MengjinYan, @dengwxn, @MortalHappiness, @alhparsa, @emmanuel-ferdman, @alexeykudinkin, @KunWuLuan, @dev-goyal, @sven1977, @akyang-anyscale, @GokuMohandas, @raulchen, @abrarsheikh, @edoakes, @JoshKarpel, @bhmiller, @seanlaii, @ruisearch42, @dayshah, @Bye-legumes, @petern48, @richardliaw, @rclough, @israbbani, @jiwq