Skip to content

Ray-2.47.0

Compare
Choose a tag to compare
@aslonnie aslonnie released this 12 Jun 22:28
· 2066 commits to master since this release
6f4c0c0

Release Highlights

  • Prefill disaggregation is now supported in initial support in Ray Serve LLM (#53092). This is critical for production LLM serving use cases.
  • Ray Data features a variety of performance improvements (locality-based scheduling, non-blocking execution) as well as improvements to observability, preprocessors, and other stability fixes.
  • Ray Serve now features custom request routing algorithms, which is critical for high throughput traffic for large model use cases.

Ray Libraries

Ray Data

🎉 New Features:

  • Add save modes support to file data sinks (#52900)
  • Added flattening capability to the Concatenator preprocessor to support output vectorization use cases (#53378)

💫 Enhancements:

  • Re-enable Actor locality-based scheduling. This PR also improves algorithms for ranking the locations for the bundle. (#52861)
  • Disable blocking pipeline by default until Actor Pool fully scales up to min actors (#52754)
  • Progress bar and dashboard improvements to show name of partial functions properly(#52280)

🔨 Fixes:

  • Make Ray Data from_torch respect Dataset len (#52804)
  • Fixing flaky aggregation test (#53383)
  • Fix race condition bug in fault tolerance by disabling on_exit hook (#53249)
  • Fix move_tensors_to_device utility for the list/tuple[tensor] case (#53109)
  • Fix ActorPool scaling to avoid scaling down when the input queue is empty (#53009)
  • Fix internal queues accounting for all Operators w/ an internal queue (#52806)
  • Fix backpressure for FileBasedDatasource. This fixes potential OOMs for workloads using FileBasedDatasources (#52852)

📖 Documentation:

  • Fix working code snippets (#52748)
  • Improve AggregateFnV2 docstrings and examples (#52911)
  • Improved documentation for vectorizers and API visibility in Data (#52456)

Ray Train

🎉 New Features:

  • Added support for configuring Ray Train worker actor runtime environments. (#52421)
  • Included Grafana panel data in Ray Train export for improved monitoring. (#53072)
  • Introduced a structured logging environment variable to standardize log formats. (#52952)
  • Added metrics for TrainControllerState to enhance observability. (#52805)

💫 Enhancements:

  • Logging of controller state transitions to aid in debugging and analysis. (#53344)
  • Improved handling of Noop scaling decisions for smoother scaling logic. (#53180)

🔨 Fixes:

  • Improved move_tensors_to_device utility to correctly handle list / tuple of tensors. (#53109)
  • Fixed GPU transfer support for non-contiguous tensors. (#52548)
  • Increased timeout in test_torch_device_manager to reduce flakiness. (#52917)

📖 Documentation:

  • Added a note about PyTorch DataLoader’s multiprocessing and forkserver usage. (#52924)
  • Fixed various docstring format and indentation issues. (#52855, #52878)
  • Removed unused "configuration-overview" documentation page. (#52912)
  • General typo corrections. (#53048)

🏗 Architecture refactoring:

  • Deduplicated ML doctest runners in CI for efficiency. (#53157)
  • Converted isort configuration to Ruff for consistency. (#52869)
  • Removed unused PARALLEL_CI blocks and combined imports. (#53087, #52742)

Ray Tune

💫 Enhancements:

  • Updated test_train_v2_integration to use the correct RunConfig. (#52882)

📖 Documentation:

  • Replaced session.report with tune.report and corrected import paths. (#52801)
  • Removed outdated graphics cards reference in docs. (#52922)
  • Fixed various docstring format issues. (#52879)

Ray Serve

🎉 New Features:

  • Added support for implementing custom request routing algorithms. (#53251)
  • Introduced an environment variable to prioritize custom resources during deployment scheduling. (#51978)

💫 Enhancements:

  • The ingress API now accepts a builder function in addition to an ASGI app object. (#52892)

🔨 Fixes:

  • Fixed runtime_env validation for py_modules. (#53186)
  • Disallowed special characters in Serve deployment and application names. (#52702)
  • Added a descriptive error message when a deployment name is not found. (#45181)

📖 Documentation:

  • Updated the guide on serving models with Triton Server in Ray Serve.
  • Added documentation for custom request routing algorithms.

Ray Serve/Data LLM

🎉 New Features:

  • Added initial support for prefill decode disaggregation (#53092)
  • Expose vLLM Metrics to serve.llm API (#52719)
  • Embedding API (#52229)

💫 Enhancements:

  • Allow setting name_prefix in build_llm_deployment (#53316)
  • Minor bug fix for 53144: stop tokens cannot be null (#53288)
  • Add missing repetition_penalty vLLM sampling parameter (#53222)
  • Mitigate the serve.llm streaming overhead by properly batching stream chunks (#52766)
  • Fix test_batch_vllm leaking resources by using larger wait_for_min_actors_s

🔨 Fixes:

  • LLMRouter.check_health() should check LLMServer.check_health() (#53358)
  • Fix runtime passthrough and auto-executor class selection (#53253)
  • Update check_health return type (#53114)
  • Bug fix for duplication of <bos> token (#52853)
  • In stream batching, first part of the stream was always consumed and not streamed back from the router (#52848)

RLlib

🎉 New Features:

  • Add GPU inference to offline evaluation. (#52718)

💫 Enhancements:

  • Do-over of examples for connector pipelines. (#52604)
  • Cleanup of meta learning classes and examples. (#52680)

🔨 Fixes:

  • Fixed weight synching in offline evaluation. (#52757)
  • Fixed bug in split_and_zero_pad utility function (related to complex structures vs simple values or np.arrays). (#52818)

Ray Core

💫 Enhancements:

  • uv run integration is now enabled by default, so you don't need to set the RAY_RUNTIME_ENV_HOOK any more (#53060). If you rely on the previous behavior where uv run only runs the Ray driver but not the workers in the uv environment, you can switch back to the old behavior by setting the RAY_ENABLE_UV_RUN_RUNTIME_ENV=0 environment variable.
  • Record gcs process metrics (#53171)

🔨 Fixes:

  • Improvements for using RuntimeEnv in the Job Submission API. (#52704)
  • Close unused pipe file descriptor of child processes of Raylet (#52700)
  • Fix race condition when canceling task that hasn't started yet (#52703)
  • Implement a thread pool and call the CPython API on all threads within the same concurrency group (#52575)
  • cgraph: Fix execution schedules with collective operations (#53007)
  • cgraph: Fix scalar tensor serialization edge case with serialize_to_numpy_or_scalar (#53160)
  • Fix the issue where a valid RestartActor rpc is ignored (#53330)
  • Fix reference counter crashes during worker graceful shutdown (#53002)

Dashboard

🎉 New Features:

  • train: Add dynolog for on-demand GPU profiling for Torch training (#53191)

💫 Enhancements:

  • Add configurability of 'orgId' param for requesting Grafana dashboards (#53236)

🔨 Fixes:

  • Fix Grafana dashboards dropdowns for data and train dashboard (#52752)
  • Fix dashboard for daylight savings (#52755)

Ray Container Images

💫 Enhancements:

  • Upgrade h11 (#53361), requests, starlette, jinja2 (#52951), pyopenssl and cryptography (#52941)
  • Generate multi-arch image indexes (#52816)

Docs

🎉 New Features:

  • End-to-end example: Entity recognition with LLMs (#52342) - new end-to-end example
  • End-to-end example: xgboost tutorial (#52383)
  • End-to-end tutorial for audio transcription and LLM as judge curation (#53189)

💫 Enhancements:

  • Adds pydoclint to pre-commit (#52974)

Thanks!

Thank you to everyone who contributed to this release!

@NeilGirdhar, @ok-scale, @JiangJiaWei1103, @brandonscript, @eicherseiji, @ktyxx, @MichalPitr, @GeneDer, @rueian, @khluu, @bveeramani, @ArturNiederfahrenhorst, @c8ef, @lk-chen, @alanwguo, @simonsays1980, @codope, @ArthurBook, @kouroshHakha, @Yicheng-Lu-llll, @jujipotle, @aslonnie, @justinvyu, @machichima, @pcmoritz, @saihaj, @wingkitlee0, @omatthew98, @can-anyscale, @nadongjun, @chris-ray-zhang, @dizer-ti, @matthewdeng, @ryanaoleary, @janimo, @crypdick, @srinathk10, @cszhu, @TimothySeah, @iamjustinhsu, @mimiliaogo, @angelinalg, @gvspraveen, @kevin85421, @jjyao, @elliot-barn, @xingyu-long, @LeoLiao123, @thomasdesr, @ishaan-mehta, @noemotiovon, @hipudding, @davidxia, @omahs, @MengjinYan, @dengwxn, @MortalHappiness, @alhparsa, @emmanuel-ferdman, @alexeykudinkin, @KunWuLuan, @dev-goyal, @sven1977, @akyang-anyscale, @GokuMohandas, @raulchen, @abrarsheikh, @edoakes, @JoshKarpel, @bhmiller, @seanlaii, @ruisearch42, @dayshah, @Bye-legumes, @petern48, @richardliaw, @rclough, @israbbani, @jiwq