Skip to content

Releases: StanfordLegion/legion

Version 25.09.0 (September 30, 2025) – Grand Refactor, Realm Split

01 Oct 15:50

Choose a tag to compare

  • Build
    • The Realm build system has been split off, see below for details
  • Legion
    • Large-scale refactoring of the Legion code base
    • All Legion error messages have been rewritten for clarity
    • Legion now checks for typos in command-line flags and issues errors if any flags are misspelled
    • Legion source code is now formatted via clang-format
  • Realm
    • Realm has been split into its own repository: https://github.com/StanfordLegion/realm
    • A read-only copy of Realm is still maintained under the Legion repository in the /realm directory
    • The Realm build system has been split and is now invoked via CPM in the Legion CMake build, but must be built separately in the Makefile build. The location of a pre-existing Realm installation can be specified in both builds via Realm_ROOT

Version 25.06.0 (July 2, 2025)

03 Jul 18:15

Choose a tag to compare

  • Build
    • Make UCX bootstrap module optional
  • Legion
    • Bug fixes
  • Tools
    • Minimum required Rust version is now 1.85
    • Support for generating a DuckDB database from a profile via legion_prof duckdb subcommand
    • Allow searching inside of merged boxes in profiles
    • Improved client-side error handling in profiler
    • Improved client-side caching in profiler
    • Support attaching to a local archive on disk
    • Upgrade to egui 0.29
  • Realm
    • Improvements to new Realm build system
    • Added automatic payload fragmentation inside the active message layer
    • Added a switch to disable barrier broadcast, currently off by default
    • Bug fixes

Version 25.03.0 (March 28, 2025) – One Pool, Automatic Traces

28 Mar 17:57

Choose a tag to compare

  • Build
    • Minimum required CMake version is now 3.22
    • Minimum required CUDA version is now 11.7
    • Experimental CMake build system under /realm for Realm standalone builds
  • Legion
    • Legion now uses a "one pool" memory allocation strategy that removes the need to split memory into two pools via the -lg:eager_alloc_percentageflag. Applications may take advantage of the new support to specify the additional memory required by each task
    • Legion now automatically discovers traces of repeated sequences of tasks by default. Existing traces specified with begin_trace and end_traceare still respected and take priority over automatic traces. If automatic traces are not desired, they can be disabled with -lg:no_auto_tracing
    • Added support for user-specified profiling ranges
    • Reference count sparsity maps correctly so the memory associated with them is now reclaimed
  • Tools
    • Minimum required Rust version is now 1.84
    • Fix for correctly processing negative points and rects
    • Support for profiling external handshake objects
  • Realm
    • The hwloc support for topology discovery in Realm, which had bitrotted, has been revived and substantially improved
    • Support for NVIDIA Blackwell GPU architecture
    • Expanded unit test coverage

Version 24.12.0 (December 20, 2024)

19 Dec 18:53

Choose a tag to compare

  • Legion
    • Numerous bug fixes
  • Regent
    • Support for running without the CUDA hijack
    • Support for NVIDIA Hopper GPU architecture
  • Tools
    • Support for exporting profiles to NVTXW format
    • Simplifications that may improve performance by removing obsolete features
  • Realm
    • Remove the need for dynamic_cast in ExternalResource
    • Support for CUPTI profiling
    • Support for registring per GPU reduction operations via CUfunction
    • Add a flag and default disable ATS/HMM support and shared CPU memories
    • Support for scalable barrier via radix tree
    • Support for querying resources of NUMA
    • Support for backtrace via cpptrace
    • More unit tests
    • CI coverage for HIP with NVIDIA backend

Version 24.09.0 (September 27, 2024)

26 Sep 16:57

Choose a tag to compare

  • Legion
    • Bug fixes for control replication and multi-node configurations
  • Regent
    • Fixes for ROCm 6.0 code generation
  • Tools
    • Legion Prof now uses subcommands (e.g., legion_prof view) to clarify which options apply to which actions
    • Legion Prof now tracks backtraces at the points where blocking wait calls are performed by the application
    • Legion Prof reports more detailed timing information for tasks
    • Legion Prof calculates clock skew between nodes and reports it when relevant
    • Commonly used features of Legion Prof are now enabled by default
    • The old Python Legion Prof implementation is no longer supported
  • Realm
    • Point fields x, y, z and w have been replaced by methods
    • Support for launching CUDA tasks onto a CUDA stream asynchronously via cuCtxRecordEvent without the need of CUDA hijack
    • Support for CUDA fabric sharing
    • Support for host-to-host copies via CUDA DMA
    • Support for querying number of NUMA nodes from the NumaModuleConfig
    • Added reference counting for preimage operations
    • Make std::atomic as the default atomic implementation
    • Remove REALM_CXX_STANDARD, and bump the minimal requirement to C++17
    • Implemented an ABI stable wrapper for GASNetEX
    • Additional unit tests including CircularQueue, ReplicatedHeap, find_fastest_path, DynaamicTableAllocator, generate_gather_paths, TransferIteratorIndexSpace
    • Dead code cleanups and bug fixes

Version 24.06.0 (June 28, 2024) – Nonidempotent Traces

28 Jun 16:22

Choose a tag to compare

  • Build
    • Minimum required C++ standard is now 17
    • Embedded GASNet build in CMake now automatically enables GPU memory kinds
  • Legion
    • Support for nonidempotent traces (where the postconditions do not imply the preconditions of the trace)
    • Deletions are now committed in program order, making it easier for users to reason about when their effects take place
    • All tasks (and other operations) are now committed in order (a prerequisite for anticipated, but not yet implemented, precise exception support)
    • Improvements to Legion's internal algorithm for virtual instances, fixing various correctness bugs in the implementation
    • Improvements to the DefaultMapper handling of task layout constraints
  • Regent
    • Improvements to make compiler more deterministic
    • Improvements to auto-detect CUDA
    • Support for complex numbers in std/format
    • Static control replication (SCR) and RDIR have been completely removed. All SCR and RDIR related flags (-fflow-*) have been removed, except for -fflow 0 which is permitted (but no longer does anything, and now issues a warning)
  • Tools
    • Restore profiler's ability to render dependent partitioning channels
    • Render mapper information on mapper calls in the profiler
    • Render user-provided profiling information in the profiler
  • Realm
    • UVM support for the HIP module
    • Error code support for command line parser
    • Support for querying MIG devices from NVML
    • Add indirection channel query
    • Additional unit tests and bug fixes

Version 24.03.0 (March 27, 2024) – Control Replication

27 Mar 16:14

Choose a tag to compare

Legion is an implicitly parallel, distributed runtime system for heterogeneous supercomputers.

The most notable feature in this release is control replication, a feature that we have been working on for many years that makes Legion dramatically more scalable in typical usage scenarios. In fact, the vast majority of users have already been using control replication, meaning that this is the first stable release of Legion which is usable (in a practical manner) for the vast majority of our users.

If you are not familiar with control replication, there is a wiki page that describes it, and of course the original paper.

As of this release, that means that the old control_replication branch is no longer being updated, and will be deleted at some point in the future. All updates from now on will go into the master branch, and it is our intention to avoid any long-standing feature branches in the future.

This release also finally removes some old Legion features that have been deprecated for nearly 10 years at this point. If you were somehow using those features, you will need to update to their replacements.

In addition, with this release, we are now packaging Legion Prof via crates.io. That means you can now install Legion Prof with:

cargo install --all-features --locked [email protected]

(Note the version format is 0.YYMM.0. This is required because Rust uses semver while Legion uses calver.)

Full release notes:

  • Build
    • ROCm 6.0 is now supported, and support for ROCm 4.x has been removed
  • Legion
    • Support for control replication has been merged
    • Support for discarding region contents on task completion
    • Long-deprecated APIs, such as the old HighLevel namespace, have been removed
  • Mappers
    • Default mapper support for control replication
    • Default and null mapper now use C++ override keyword
  • Regent
    • Support for pure projection functors that capture arguments
    • Static control replication (SCR) has been deprecated and will be removed in a future release
  • Tools
    • The profiler now correctly recognizes the logger format version and throws an error if it does not match
    • The profiler now reports when a profile was generated with debug mode (or another expensive setting) was enabled
    • Many profiler fixes for correctly rendering runtime and mapper calls
    • Profiler now renders GPU device and host execution separately
    • Optimizations to improve profiler memory usage and running time
    • Rust profiler now requires at least Rust 1.74
  • Realm
    • Support for registration of dynamically allocated buffers
    • Support for handling poisoned events for reservation
    • Refactor CUDA allocation and IPC paths
    • Support for querying CUDA device information (GPU UUID and ID),process information (process ID, hostname, host ID) and timer calibration error from the profiler
    • Remove address alignment from serializer and deserializer
    • Support for creating network shared peers using IPC mailbox
    • Support OMP thread binding and allow for multiple OMP parallel sections when enabling system OMP runtime
    • Add Realm unit tests
    • Fixes for Realm tests, sparsity map, MemoryQuery, dynamic framebuffer memory and memcpy channel

Version 23.12.0 (December 14, 2023)

14 Dec 17:41

Choose a tag to compare

  • Regent
    • Support for HIP multi-GPU per runtime
  • Realm
    • Improve scalability of startup by replacing point-to-point communication with allgatherv for machine model announcements
    • Support shared memory communication for system memory
    • Provide sanity check for GPU tasks to detect any leak of CUDA streams
    • Support for GPU transposes in CUDA-DMA
    • Bug fixes for CUDA-DMA

Version 23.09.0 (September 28, 2023)

28 Sep 23:38

Choose a tag to compare

  • Regent
    • Elide future maps in index launches
    • Improvements to Pygion interop
  • Realm
    • Add a machine configuration API that allows applications to configure the machine model without using the command line
    • Expose Realm managed CUDA/HIP stream to applications to launch GPU tasks without device-wise synchronization when hijack is disabled
    • Change timers to use rdtsc
    • Improve performance for getting highest priority task available in any task queue
    • Implement framebuffer memory with cuMemMap
    • Initial work for moving STL dependencies to header only

Version 23.06.0 (June 28, 2023)

27 Jun 17:56

Choose a tag to compare

  • Build
    • Fixes for CMake build on macOS
    • Fixes for HIP build when arch is specified
  • Realm
    • Support for better backtraces via libdw and libunwind
    • Improve scalability and performance in task spawning by caching the triggering operation of an event if one is provided
    • Fix a minor issue with affinity queries to properly clear the user-provided vector before populating it
    • Add more accurate GPU memory bandwidth affinity calculations if NVML is available
    • Refactor CPU core topology enumeration to serve systems without NUMA capabilities (like Jetson ARM systems)
    • Improve scalability and performance of task spawning by moving event reuse freelists to be per-processor, reducing lock contention
    • Add a microbenchmark for measuring task throughput more accurately
    • Add a series of Realm API tutorials
    • Replace CU_EVENT_DEFAULT with CU_EVENT_DISABLE_TIMING for better performance of CUDA events
    • Support Kokkos interop for the HIP module
    • Fixes for Realm tests on macOS
  • Tools
    • Legion Prof now supports search in the new profiler UI
    • Legion Prof now supports an HTTP client/server interface. Launch the server with --serve (on port 8080 by default) and attach a client to it with --attach http://127.0.0.1:8080
    • Legion Prof now supports a new achival mode via the --archiveflag. Generate an offline profile and view it either via --attach or by uploading it to a server and navigating to https://legion.stanford.edu/prof-viewer/?url=...
    • Legion Prof modes (client/server/viewer) are now parallel by default, and perform heavy computations off the UI thread for better responsiveness
    • Add support for rendering indirect copies (i.e., gather/scatter)
    • Fix rendering of profiles over HTTP with old profiler UI
    • Fix profiling of copies with different numbers of hops between instances