Pulse · openxla/xla · GitHub

July 12, 2025 – July 19, 2025

Overview

253 Active pull requests

10 Active issues

Could not load contribution data

Please try again later

175 Pull requests merged by 1 person

Add sdy shardings in frontend_attributes alongside hlo shardings for extra wrapper main added in tf2xla bridge.
#28879 merged Jul 19, 2025
[XLA:GPU] Support control flow thunks in command buffer conversion pass. We only convert kWhile and kConditional thunks if all thunks in all brunches are convertible.
#29059 merged Jul 19, 2025
#sdy Fix forward of making XLA C++ changes so we can fall back to GSPMD in JAX export if the loaded module was lowered for GSPMD.
#28527 merged Jul 19, 2025
[XLA] Add helper function GetIndicesSpecForDynamicSlice to get indices spec for dynamic slice fed by all-gather, the spec includes the mapping from slice offsets to corresponding partition IDs(flattened-id).
#28816 merged Jul 19, 2025
Internal, visibility only changes to public code.
#29075 merged Jul 19, 2025
Add visibility to hlo_input_output_format
#29076 merged Jul 19, 2025
Reduce redundancy between StringTo* enum functions.
#29070 merged Jul 19, 2025
[XLA:CPU] Refactor Intrinsic and use it in all math intrinsics.
#28940 merged Jul 19, 2025
Integrate LLVM at llvm/llvm-project@06ae0c2a1086
#29043 merged Jul 19, 2025
Update nccl_archive BUILD file to fix TF GPU wheel build.
#29074 merged Jul 19, 2025
[XLA:GPU] Add a verifier to the GPU compiler before post-scheduling pipeline.
#29040 merged Jul 19, 2025
Use host callback in the CopyToHostFuture method in Async PjRt.
#29072 merged Jul 18, 2025
Add function ExtractDynamicSliceFromCollectiveUser to extract a dynamic slice user from a collective.
#28813 merged Jul 18, 2025
Reverts 6319f0d3bdfd3078e04bb984a759c890b7116484
#29064 merged Jul 18, 2025
Typo fix "perferred" -> "preferred".
#29068 merged Jul 18, 2025
PR #28257: [XLA:GPU] Update ONEAPI crosstool compiler wrapper
#29039 merged Jul 18, 2025
Use ASSERT_THAT to check pass.Run() result
#29048 merged Jul 18, 2025
Annotate some XLA:GPU flags as stable i.e. they should provide 6 month deprecation notice.
#29028 merged Jul 18, 2025
[XLA:GPU] Add a test for DotForInt4vsIdentityBF16ReturnsCorrectResult.
#28986 merged Jul 18, 2025
PR #28985: [XLA:GPU] Add shared_memory_per_block_optin device info member
#29033 merged Jul 18, 2025
[XLA:GPU] Move Dot strength reduction out of algebraic simplifier
#29049 merged Jul 18, 2025
[XLA:GPU] Remove CHECK-CSE since it is not used.
#29025 merged Jul 18, 2025
#sdy improve the error messaging when importing and exporting sharding custom calls.
#28967 merged Jul 18, 2025
Introduce new helper function that produces device lists for iota tile assignment. Apply it in xla_sharding_util.cc.
#29056 merged Jul 18, 2025
Introduce stable flags and associated deprecation policy for XLA debug options.
#28974 merged Jul 18, 2025
Use GetInPlaceInputOutputPairs from AliasInfo instead of HloDataflowAnalysis.
#29051 merged Jul 18, 2025
Remove ifdef from ir_emitter_unnested and fix various clang-tidy warnings
#29023 merged Jul 18, 2025
Add TmaMetadata serialization support
#29011 merged Jul 18, 2025
Automated Code Change
#29015 merged Jul 18, 2025
Move GetInPlaceInputOutputPairs and related code to AliasInfo class (NFC).
#29019 merged Jul 18, 2025
Automated Code Change
#29021 merged Jul 18, 2025
Remove leftover logging
#29036 merged Jul 18, 2025
Propagate context to the waiter destruction sequence, so that all contained operations execute with the correct context.
#29035 merged Jul 18, 2025
Update PjRtCpuExecutable to not rely on any internals of PjRtCpuBuffer.
#29037 merged Jul 18, 2025
[XLA:TPU] In MSA, when removing instructions, we need to remove their scoped allocations from PresetAssignments.
#28898 merged Jul 17, 2025
Modified python bindings to enable passing a probe_instrumentation_dir to support interpreter ops in eval_module. Consistent with StableHLO interpreter usage from command line
#29006 merged Jul 17, 2025
[XLA][host offloading] Return AsyncValue from HostOffloadingExecutable.
#28874 merged Jul 17, 2025
#sdy update dump names and add index as prefix so they would be clearer for users
#29018 merged Jul 17, 2025
[Autotuner] Add block level emitter backend for Triton fusion (3).
#28810 merged Jul 17, 2025
[IFRT] Add UserContextScope
#28949 merged Jul 17, 2025
Add ReleaseDeviceMemoryOwnership implementation based on
#29034 merged Jul 17, 2025
Migrate uses of XLA_TEST_BACKEND macros to use utilities in xla_test_backend_predicates.h
#29029 merged Jul 17, 2025
Correctly identify async start and done ops in latency hiding scheduler.
#29005 merged Jul 17, 2025
Close output shardings to respect allow_spmd_sharding_propagation_to_output flag set to default {false} value. Added multiple test variants to test shardy, use_compile_options_from_model.
#29022 merged Jul 17, 2025
[NCCL] Upgrade TF NCCL version to 2.26.5
#26949 merged Jul 17, 2025
[xla:cpu] Make DotLibraryRewriter support greedy fusion mode.
#28496 merged Jul 17, 2025
Optimize BM_GlobalDecreasingSizeBestFitHeap benchmark by up to 3%.
#28996 merged Jul 17, 2025
Update CommonPjRtBufferImpl to have specialized versions for both cpu->device
#29002 merged Jul 17, 2025
#sdy define the utils that JAX jaxlib will use to allow for falling back to GSPMD when loading an old checkpoint.
#29026 merged Jul 17, 2025
[Autotuner] Add block level emitter backend for Triton fusion (2).
#28808 merged Jul 17, 2025
Use ASSERT_THAT(..., IsOkAndHolds(true)) for consistency and correctness
#28944 merged Jul 17, 2025
Reverts e3c8dc729f1ac49d6a5a4e09f973ba40c185f6d9
#29008 merged Jul 17, 2025
Simplify ShouldSkipForSideEffect function in zero_sized_hlo_elimination.
#29010 merged Jul 17, 2025
[XLA:GPU] Remove unused DotSparsityRewriter.
#29024 merged Jul 17, 2025
Automated Code Change
#29020 merged Jul 17, 2025
[XLA:GPU] additional logging in triton fusion numeric verifier
#28981 merged Jul 17, 2025
[xla:gpu][triton] triton-xla-squeeze-dims pass improvements.
#29009 merged Jul 17, 2025
Automated Code Change
#28908 merged Jul 17, 2025
PR #28073: [XLA:GPU][oneAPI] Enable Level_zero support
#28953 merged Jul 17, 2025
Remove deprecated HloAliasAnalysis::Run method
#28968 merged Jul 17, 2025
Add serialization and deserialization for the cuDNN thunk
#28872 merged Jul 17, 2025
[xla] Optimize ShapeUtil::ForEach traverals
#28987 merged Jul 17, 2025
[xla:tf] Check if device shape is already a host shape
#28951 merged Jul 17, 2025
Rollback https://github.com/openxla/xla/commit/cf3dfa9723c4cd4e2b25a606207a201a95fe71db
#28990 merged Jul 17, 2025
Support for nested while loops in while_loop_unroller.
#27791 merged Jul 16, 2025
Move op name longest prefix logic from annotation.cc to somewhere upper level
#26865 merged Jul 16, 2025
Migrate uses of XLA_TEST_BACKEND macros to use utilities in xla_test_backend_predicates.h
#28945 merged Jul 16, 2025
[XLA] Refactoring Reduce Window Rewriter to reduce complexity
#28890 merged Jul 16, 2025
[JAX]: rollforward. Add ability to add a transfer server factory to override
#28993 merged Jul 16, 2025
[xla] Move xla::Shape functions that are used on a hot path to header file
#28982 merged Jul 16, 2025
Reverts 198c17b8bfb03c893a19dc973d634b509aa69ede
#28988 merged Jul 16, 2025
Complete the CommonPjRtBufferImpl implementation.
#28941 merged Jul 16, 2025
#sdy Mark xla.sdy.LocalToGlobalShape custom call as side effecting so it isn't removed if unused.
#28963 merged Jul 16, 2025
Added PjrtClient::UpdateGlobalProcessInfo method.
#28011 merged Jul 16, 2025
PR #28877: [XLA]Clamp num_workers to avoid partition overflow
#28971 merged Jul 16, 2025
[tf] Use non-owning ShapeTree to pass execution inputs to XLA
#28979 merged Jul 16, 2025
[XLA] Be less aggressive about recursively updating metadata when inlining.
#28969 merged Jul 16, 2025
[XLA:GPU] Move IsIntermediate & FindHero to shared ir_emission_utils.
#28976 merged Jul 16, 2025
[XLA:ALGEBRAIC_SIMPLIFIER] If an optimization barrier has an unused side-effecting instruction, do not remove the optimization barrier
#28973 merged Jul 16, 2025
Move HloAliasAnalysis out of HloModuleGroupMetadata (NFC).
#28961 merged Jul 16, 2025
Pass proper AliasInfo to HloAliasAnalysis::Run in tests (NFC).
#28965 merged Jul 16, 2025
[XLA:GPU] Update documentation for triton_xla.extract/insert.
#28964 merged Jul 16, 2025
[xla][gpu][triton] Temporarily disable triton squeeze dims pass, due to internal benchmark regression.
#28957 merged Jul 16, 2025
Remove unused HloAliasAnalysis instance (NFC).
#28954 merged Jul 16, 2025
Skip TreeReductionRewriter for Slinky.
#28914 merged Jul 16, 2025
[XLA:GPU] update triton test for generic emitter
#28934 merged Jul 16, 2025
[xla] Add benchmark for ShapeUtil::SubshapeCount
#28952 merged Jul 16, 2025
Automated Code Change
#28913 merged Jul 16, 2025
[xla] Change the order of std::variant types in MaybeOwningDeviceMemory
#28947 merged Jul 16, 2025
The raw buffer CopyToMemorySpace don't seem to quite work yet cross client, so avoid
#28948 merged Jul 16, 2025
[xla] Optimize constructing ShapeTree
#28946 merged Jul 16, 2025
[JAX] Cache transfer server connections for cross-host device_put.
#28942 merged Jul 15, 2025
Update target define states before we update ready list.
#28943 merged Jul 15, 2025
Reverts 41367aa00d6e2843301b1bc793ac5090564a3ef1
#28937 merged Jul 15, 2025
Optimize xla::GlobalDecreasingSizeBestFitHeap::MakeFreeChunks when using power-of-2 memory alignments, and add 1024B alignment test to benchmark.
#28885 merged Jul 15, 2025
Create xla::test::Empty for instantiating empty test suites.
#28938 merged Jul 15, 2025
Add ::GetReadyFuturePromise to be used in implementing
#28904 merged Jul 15, 2025
Add an option to do multiple executions of the same module to HloRunners.
#28776 merged Jul 15, 2025
Pass proper AliasInfo to HloAliasAnalysis::Run (NFC).
#28926 merged Jul 15, 2025
[XLA][Numerics][HLO Value Tracking] Add recovery modules when removing nested reshapes on TPU
#28611 merged Jul 15, 2025
Add CopyToMemorySpace which calls DirectCopyToMemorySpace or
#28900 merged Jul 15, 2025
#HLODiff Remove text diff summary
#28894 merged Jul 15, 2025
#HLODiff Update print progress at the end of matcher to show 100%.
#28892 merged Jul 15, 2025
[XLA:CPU] Don't expand tanh at the fusion level.
#28930 merged Jul 15, 2025
[IFRT] Do not set MHLO shardings if sdy partitioned
#28865 merged Jul 15, 2025
Adds a new rematerialization method that focuses on rematerializing only the highest memory usage peak in the module at any given remat pass (instead of rematerializing the first point at which the memory limit is reached). Should result in more monotonic rematerialization and avoid rematerializing unecessary instructions. Usually not as efficient as regular rematerialization but can help in specific cases. The new mode is not enabled yet. Reworks Instruction List to use unique ptrs.
#28935 merged Jul 15, 2025
Handle GetDonatableInputIndices() errors
#28907 merged Jul 15, 2025
[XLA:CPU] Disable fusion level vectorization.
#28927 merged Jul 15, 2025
Add missing header.
#28848 merged Jul 15, 2025
[XLA:CPU][XLA:GPU] Set default alignment of vector load/store as that of the vector element type.
#28925 merged Jul 15, 2025
#sdy Clean up AddAxisOrMergeInserter in dedup_meshes
#28447 merged Jul 15, 2025
Update the link for hermetic CUDA documentation.
#28932 merged Jul 15, 2025
[ifrt] Fix spelling in CopyArraysOp description.
#28933 merged Jul 15, 2025
PR #28716: [GPU] Make fabric info test compatible with lower CUDA driver versions
#28801 merged Jul 15, 2025
Remove MeshAttr builder that takes a single int
#28931 merged Jul 15, 2025
#sdy Mark xla.sdy.LocalToGlobalShape custom call as side effecting so it isn't removed if unused.
#28869 merged Jul 15, 2025
[XLA:GPU] Implement tiling for dot.
#28725 merged Jul 15, 2025
PR #28728: Add Nvidia benchmarks
#28882 merged Jul 15, 2025
Make Thunk keep an instance of ThunkInfo directly (NFC)
#28871 merged Jul 15, 2025
Remove workarounds for missing ABSL_DEPRECATE_AND_INLINE
#28924 merged Jul 15, 2025
[XLA:CPU][XLA:GPU] Increase limit in number of iterations of UnswitchLoopsPass.
#28873 merged Jul 15, 2025
[xla:cpu] Add DotLibraryRewriter rewrite options for oneDNN and XNNPACK.
#28923 merged Jul 15, 2025
Automated Code Change
#28921 merged Jul 15, 2025
[xla:cpu] Tiny improvements for documentation and function names
#28920 merged Jul 15, 2025
Fix shardy_xla_pass_test that is failing
#28899 merged Jul 15, 2025
[XLA:CPU][XLA:GPU] Fix missing layout on emitted constants.
#28804 merged Jul 15, 2025
Automated Code Change
#28918 merged Jul 15, 2025
Remove dependency on KernelArguments from CudnnThunk
#28870 merged Jul 15, 2025
[XLA:GPU] Do not multi-output fuse sibling transposes with reductions.
#28786 merged Jul 15, 2025
Migrate away from ArrayRef(std::nullopt_t)
#28897 merged Jul 15, 2025
PR #28401: [ROCm] Fix PackedTranspose for adapting to warp size 64
#28916 merged Jul 15, 2025
PR #25914: [NVIDIA GPU] Add nvshmem communicator and runtime thunks
#28863 merged Jul 15, 2025
[XLA] Propagate op_names recursively in the CallInliner.
#28887 merged Jul 15, 2025
Fix test-case when NVML library is not available.
#28876 merged Jul 15, 2025
[xla:cpu] Mark cpu_function_runtime alignment as deprecated
#28759 merged Jul 15, 2025
initial implementation of send/recv static verification
#28620 merged Jul 15, 2025
Remove unused ExecutionProfile option.
#28730 merged Jul 15, 2025
[JAX] Use experimental DCN transfer library as a fallback for PjRt-IFRT cross-host device transfers when the PjRt plugin doesn't implement the cross-host transfer APIs.
#28824 merged Jul 15, 2025
Add HloAsyncStartInstruction::AddCallOperand to mirror HloCallInstruction::AddCallOperand.
#28768 merged Jul 15, 2025
[xla:codegen] Migrate Fptrunc to GetOrInsertDeclaration API
#28902 merged Jul 15, 2025
[XLA] Refactoring Reduce Window Rewriter to reduce complexity
#28822 merged Jul 15, 2025
Migrate away from ArrayRef(std::nullopt_t)
#28895 merged Jul 15, 2025
[xla:codegen] Use Intrinsic::Type in Fptruc::CreateDefinition
#28881 merged Jul 14, 2025
Add 'mode' attribute to AllReduce and ReduceScatter.
#28429 merged Jul 14, 2025
[IFRT IR] Add IFRT IR program interpreter
#28891 merged Jul 14, 2025
Extract CheckUniformReplicaGroups to verify that all replica groups in a collective instruction are of the same size, which is a precondition for many collective optimizations.
#28812 merged Jul 14, 2025
[Efficiency]Cleanup unused metrics which track the pjrt compilation status.
#28761 merged Jul 14, 2025
[IFRT IR] Add pipeline for compiling IFRT IR programs
#28857 merged Jul 14, 2025
[XLA:GPU] update determenism test to use generic triton emitter
#28880 merged Jul 14, 2025
set layout assignment for the result correctly
#28559 merged Jul 14, 2025
Add CloneWithControlDependency which is used to implement
#28815 merged Jul 14, 2025
[XLA:GPU]: Enable two-shot all reduce implementation for usage.
#28594 merged Jul 14, 2025
set default layout when exporting dense constants from HLO to MLIR
#28763 merged Jul 14, 2025
Re-enable precompilation for some tests.
#28772 merged Jul 14, 2025
[XLA:GPU] enable nested fusion for autotuner test
#28875 merged Jul 14, 2025
Align AtLocation signature with Abseil LogMessage::AtLocation.
#28867 merged Jul 14, 2025
[XLA] Use "edge time indices" to skip some redundant calls to FindChunkCandidate.
#28769 merged Jul 14, 2025
[XLA:CPU] Move erf32 approximation to mathlib.
#28796 merged Jul 14, 2025
Removing stale function signature references from tensorflow that rely on old options of type variant<int, string>
#27858 merged Jul 14, 2025
[XLA:CPU] Add expm1 expansion.
#28795 merged Jul 14, 2025
[XLA:GPU]: Calculate rank_offset and rotated_ranks outside the kernel.
#28232 merged Jul 14, 2025
[XLA:CPU] Move passes from expand_float_ops that lower to math lib.
#28794 merged Jul 14, 2025
[XLA:GPU]: Calculate launch dimensions based on input size.
#28186 merged Jul 14, 2025
Pass proper AliasInfo to HloAliasAnalysis::Run in HostOffloader (NFC).
#28866 merged Jul 14, 2025
[XLA:GPU] Print fusion string when selecting the best result, instead of root string.
#28800 merged Jul 14, 2025
[xla][gpu][triton] Do not duplicate code in squeeze dims pass, re-enable the pass.
#28861 merged Jul 14, 2025
Disable NVSHMEM send-recv test-case due to flakiness.
#28858 merged Jul 14, 2025
PR #28295: [NVIDIA GPU] Do out of place allreduce for nvshmem
#28860 merged Jul 14, 2025
[XLA:GPU] Remove code for horizontal_input_fusion.
#28562 merged Jul 14, 2025
Update StreamExecutorGpuClientTest.PropagateError test to expect unpacked tuples
#28864 merged Jul 14, 2025
XLA:GPU: Fix method ambiguity on CUDA 12.4
#28847 merged Jul 14, 2025
Avoid using PointsToAnalysis in DFSMemoryScheduler (NFC).
#28747 merged Jul 14, 2025
Always stage transfers when doing d2h copy to avoid memory corruption issue.
#28828 merged Jul 14, 2025
[xla:codegen] Use Intrinsic::Type in Fptruc::GetOrInsertDeclaration
#28841 merged Jul 13, 2025
Reverts ff9ecfc192000b5a62c0adabfd968e5703b0229a
#28832 merged Jul 13, 2025

78 Pull requests opened by 9 people

Migrate ListScheduler from TuplePointsToAnalysis to HloAliasAnalysis (NFC).
#28868 opened Jul 14, 2025
#sdy Fix forward of making JAX changes so we can fall back to GSPMD in JAX export if the loaded module was lowered for GSPMD.
#28878 opened Jul 14, 2025
[XLA:CPU][oneDNN] Add build flag to enable asynchronous support in oneDNN
#28883 opened Jul 14, 2025
Automated Code Change
#28884 opened Jul 14, 2025
Fix cost analysis on for output byte accessed when result is tuple
#28886 opened Jul 14, 2025
Avoid crashing when LRU cache keys change.
#28888 opened Jul 14, 2025
Automated Code Change
#28889 opened Jul 14, 2025
test PR #28728: Add Nvidia benchmarks
#28896 opened Jul 14, 2025
There is nothing in this change going to 3rd party.
#28903 opened Jul 15, 2025
Integrate LLVM at llvm/llvm-project@06ae0c2a1086
#28906 opened Jul 15, 2025
[NVIDIA GPU] [XLA_GPU_MS_COLLECTIVE] Round-robin stream assignment for async communications
#28919 opened Jul 15, 2025
[xla:gpu][triton] Add squeeze_dims of tt.descriptor_load rewrite.
#28922 opened Jul 15, 2025
Dump optimized HLO when deserializing
#28928 opened Jul 15, 2025
Add subtraction pattern to reduce scatter creator
#28929 opened Jul 15, 2025
clean device description for rocm
#28936 opened Jul 15, 2025
Integrate LLVM at llvm/llvm-project@0d5325bb203f
#28939 opened Jul 15, 2025
Update deps:
#28960 opened Jul 16, 2025
Set KV store to null with mocked GPU processes
#28962 opened Jul 16, 2025
PR #28735: [XLA:GPU] Enabling cuda graph concurrent mode by default
#28970 opened Jul 16, 2025
[XLA:GPU] Move the s4 unpacking sequence from llvm pass to int4->int8 pass
#28972 opened Jul 16, 2025
[XLA:CPU][XLA:GPU] Move concat fusion emitter to shared directory
#28975 opened Jul 16, 2025
[XLA:GPU][host offloading] Implement gpu host offloading allocator.
#28977 opened Jul 16, 2025
Allow the chaining of state across MetricHookInterface instantiations for multiple compilations.
#28978 opened Jul 16, 2025
[XLA:GPU][host offloading] Implement host offloading thunks.
#28983 opened Jul 16, 2025
[#HLODiff] Add support for manual node matching.
#28984 opened Jul 16, 2025
Add a new precompilation-only marker to `Literal`s.
#28991 opened Jul 16, 2025
Avoid recomputation of `pjrt_buffer->memory_space()` in `MakeMemoryKindFromPjRtBuffer`.
#28992 opened Jul 16, 2025
No changes to 3rd party.
#28994 opened Jul 16, 2025
[Perf] Add expensive AllGather cost adjustment to default GPU Scheduler
#28997 opened Jul 16, 2025
Added WatchJobState RPC to coordination service.
#28998 opened Jul 16, 2025
Cache device on `PJRT_Buffer`.
#29000 opened Jul 17, 2025
[XLA:MSA] Add block allocations for program weights that are not aliased and single use.
#29001 opened Jul 17, 2025
Remove `local_config_nvshmem` repository and corresponding macros.
#29003 opened Jul 17, 2025
Integrate LLVM at llvm/llvm-project@2910c24638fc
#29004 opened Jul 17, 2025
[XLA:CPU] Run CSE after inlining in fusion compiler.
#29016 opened Jul 17, 2025
PR #28883: [XLA:CPU][oneDNN] Add build flag to enable asynchronous support in oneDNN
#29017 opened Jul 17, 2025
Add 10 Maxtext-derived HLO-based benchmarks
#29027 opened Jul 17, 2025
[XLA:GPU] Refactor tests of IndexingMap
#29031 opened Jul 17, 2025
Integrate LLVM at llvm/llvm-project@06ae0c2a1086
#29032 opened Jul 17, 2025
[XLA][Numerics][HLO Value Tracking] Track original values through propagation of shardy annotation
#29038 opened Jul 17, 2025
Remove redundant string conversion.
#29041 opened Jul 17, 2025
[IFRT] Define `user_context()` in `Value` and `LoadedExecutable`
#29042 opened Jul 17, 2025
Optimize `HasCombinableReplicaGroup` and `xla::CheckReplicaGroups`.
#29044 opened Jul 18, 2025
Change `PjRtClient::LazyToLiteral` to take a generator that returns a future of the literal
#29046 opened Jul 18, 2025
[XLA:MSA] Reduce available memory bandwidth for instruction that are overlapped with bandwidth limiting asynchronous instructions.
#29047 opened Jul 18, 2025
Remove LLVM dependency from KernelThunk
#29057 opened Jul 18, 2025
Remove HLO and Autotuner dependency from CublasLtMatmulThunk
#29058 opened Jul 18, 2025
[XLA] Use sort instead of btree in MakeFreeChunks.
#29060 opened Jul 18, 2025
pass in scheduling group id when adding some new ops from ops which have id.
#29061 opened Jul 18, 2025
Automated Code Change
#29062 opened Jul 18, 2025
Introduces a new utility function, `MatchPermutedSliceAndPartitionOffset`, to detect a pattern where a `DynamicSlice` consumes the output of an `AllGather` with a permuted set of offsets. This pattern is equivalent to a `CollectivePermute` and can be optimized accordingly.
#29063 opened Jul 18, 2025
LatencyHidingScheduler: Only recalculate when we've touched an already-scheduled computation, and use computation-specific peak rather than module peak in statistics.
#29065 opened Jul 18, 2025
Remove AbstractCpuBuffer. All subclasses can be replaced with CommonPjRtBufferImpl and removed.
#29066 opened Jul 18, 2025
IFRT proxy logging fix: Do not log error when Executable is destroyed before its metadata is queried by the server (and sent over to the client).
#29067 opened Jul 18, 2025
Add SparseCore documentation
#29069 opened Jul 18, 2025
SPMD Partitioning for MX Block Scaled Dots
#29073 opened Jul 18, 2025
Give better error in run_hlo_module if HLO has collectives.
#29077 opened Jul 19, 2025
Determine collective support based on #partitions
#29078 opened Jul 19, 2025
Automated Code Change
#29080 opened Jul 19, 2025
[xla:gpu][triton] In squeeze-dims pass, keep at least two dimensions.
#29083 opened Jul 19, 2025
Automated Code Change
#29084 opened Jul 19, 2025
[XLA:GPU][Tiling] Use SmallVector<OneDimTile> to store tiling info.
#29085 opened Jul 19, 2025
Avoid heap allocation for the sub buffer address
#29086 opened Jul 19, 2025
Utils to add sdy shardings in frontend_attributes alongside hlo shardings for extra wrapper main added in tf2xla bridge.
#29087 opened Jul 19, 2025
Use stablehlo precision config conversion for stablehlo ops
#29088 opened Jul 19, 2025
PR #28735: [XLA:GPU] Enabling cuda graph concurrent mode by default
#29089 opened Jul 19, 2025
Adding extractor for metadata embedded in hlo expression for debugging.
#29090 opened Jul 19, 2025
Automated Code Change
#29091 opened Jul 20, 2025
Automated Code Change
#29092 opened Jul 20, 2025
Automated Code Change
#29093 opened Jul 20, 2025
Automated Code Change
#29094 opened Jul 20, 2025
Automated Code Change
#29095 opened Jul 20, 2025
Automated Code Change
#29096 opened Jul 20, 2025
Automated Code Change
#29097 opened Jul 20, 2025
Automated Code Change
#29098 opened Jul 20, 2025
Automated Code Change
#29099 opened Jul 20, 2025
Automated Code Change
#29100 opened Jul 20, 2025
Automated Code Change
#29101 opened Jul 20, 2025

3 Issues closed by 3 people

PR process improvements
#2038 closed Jul 15, 2025
[TPU] Bug: Reverse is orders of magnitude slower on TPU
#23191 closed Jul 15, 2025
PJRT_Memory vs PJRT_Buffer
#28846 closed Jul 13, 2025

7 Issues opened by 7 people

dynamic_broadcast_in_dim MHLO -> XLA HLO conversion failed
#29030 opened Jul 17, 2025
[Proposal] Extract the Profiling Subsystem into a Dedicated OpenXLA Repository
#29007 opened Jul 17, 2025
Missing CUDNN 9.10.2 for json hermetic, will CUDA 12.9.1 also be missing?
#28989 opened Jul 16, 2025
GPU mocking hangs the compiler (in autotuning)
#28959 opened Jul 16, 2025
Build error while trying to build algorithm.cc
#28905 opened Jul 15, 2025
Passing Kernels to FFI
#28893 opened Jul 14, 2025
Why does tfrt use only one stream for gpu client?
#28859 opened Jul 14, 2025

44 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[NVIDIA GPU] Skip user buffer reg when the size is 1
#28396 commented on Jul 15, 2025 • 2 new comments
[XLA:CPU][oneDNN] Implement oneDNN primitives for custom calls in Thunk runtime
#28615 commented on Jul 16, 2025 • 1 new comment
[ROCm] fixed broadcast_constant_block_dim_limit.hlo.test on rocm
#27816 commented on Jul 15, 2025 • 0 new comments
[XLA:GPU][oneAPI] Enable Clang compiler as the host compiler
#27904 commented on Jul 17, 2025 • 0 new comments
[XLA:GPU] Add support for intel gpu backend for tests
#27943 commented on Jul 18, 2025 • 0 new comments
Update Protobuf to 6.31.1
#28164 commented on Jul 14, 2025 • 0 new comments
[ROCm] Enable mx data type for ROCm
#28173 commented on Jul 18, 2025 • 0 new comments
Add metadata for CUDA and libtpu versions
#28195 commented on Jul 15, 2025 • 0 new comments
[XLA:SCHEDULING] Defer scheduling of allocate buffer custom calls
#28355 commented on Jul 15, 2025 • 0 new comments
[XLA] Remove dead argument in ProtoToHumanReadableJson
#28659 commented on Jul 14, 2025 • 0 new comments
PR #19067: [XLA:CPU][oneDNN] Move simplification pass before oneDNN pass
#28680 commented on Jul 19, 2025 • 0 new comments
#sdy Remove MHLO shardings from round-trip export
#28692 commented on Jul 17, 2025 • 0 new comments
Use RPG's solution as a hint to CP-SAT
#28723 commented on Jul 17, 2025 • 0 new comments
[XLA:benchmarks] Test Nvidia benchmarks from https://github.com/openxla/xla/pull/28728
#28727 commented on Jul 15, 2025 • 0 new comments
[XLA:GPU] Enabling cuda graph concurrent mode by default
#28735 commented on Jul 15, 2025 • 0 new comments
[XLA:GPU] Lowering dynamic update slice thunk into command buffer if it depends on loop iteration.
#28740 commented on Jul 18, 2025 • 0 new comments
[XLA:GPU] Add sycl_kernel component and test
#28762 commented on Jul 17, 2025 • 0 new comments
[XLA] Add stack trace breakdown to `HloLiveRange::ToString` for peak memory usage
#28777 commented on Jul 13, 2025 • 0 new comments
Add Hermetic C++ Toolchains for Linux x86_64 builds.
#28827 commented on Jul 17, 2025 • 0 new comments
Automated Code Change
#28836 commented on Jul 14, 2025 • 0 new comments
Automated Code Change
#28839 commented on Jul 14, 2025 • 0 new comments
Automated Code Change
#28843 commented on Jul 14, 2025 • 0 new comments
CMake build support
#1 commented on Jul 14, 2025 • 0 new comments
crosstool_wrapper_driver_is_not_gcc failed: error executing command
#3552 commented on Jul 14, 2025 • 0 new comments
Hermetic CUDA no longer respects TF_DOWNLOAD_CLANG
#16866 commented on Jul 17, 2025 • 0 new comments
Fix issues with ARM build after latest hermetic changes
#28256 commented on Jul 15, 2025 • 0 new comments
How can cross-architecture operator libraries be applied in JAX, such as cuBLAS?
#28265 commented on Jul 15, 2025 • 0 new comments
"Can't find libdevice directory" due to unset TF_CUDA_TOOLKIT_PATH
#28590 commented on Jul 15, 2025 • 0 new comments
Poor cuDNN kernel selection
#28665 commented on Jul 14, 2025 • 0 new comments
xla/service/gpu/kernels/cutlass_gemm_kernel_bf16xbf16_to_bf16.cu.cc trouble compiling cutlass
#28669 commented on Jul 16, 2025 • 0 new comments
Cross compile to ARM with custom gcc
#28807 commented on Jul 15, 2025 • 0 new comments
Possibility to specify strides when sending the data from buffer to host
#28833 commented on Jul 15, 2025 • 0 new comments
[WIP] [ROCm] Moving blas::CallContext into NumericOptions
#9593 commented on Jul 15, 2025 • 0 new comments
[ROCm] Add CudnnPadForConvolutions and CudnnVectorizeConvolutionsHLO pass to AMDGPU compiler
#10337 commented on Jul 15, 2025 • 0 new comments
[ROCm] Add CudnnNormRewriter pass to AMDGPU compiler
#10421 commented on Jul 15, 2025 • 0 new comments
Add missing unit test tags in `xla/service/gpu/BUILD`
#10620 commented on Jul 15, 2025 • 0 new comments
[ROCm] Disable fusion MLIR for ROCM
#13663 commented on Jul 15, 2025 • 0 new comments
[ROCm] Fix FP32 atomic_rmw
#14117 commented on Jul 15, 2025 • 0 new comments
[XLA:CPU][oneDNN] Move simplification pass before oneDNN pass
#19067 commented on Jul 14, 2025 • 0 new comments
Support the Execute Device Kernel feature in XLA
#26236 commented on Jul 17, 2025 • 0 new comments
[NVIDIA GPU] Dynamic SPMD iteration limit for larger fast-interconnect domain
#26391 commented on Jul 15, 2025 • 0 new comments
[NVIDIA GPU] Add support for nccl symmetric kernel
#26443 commented on Jul 14, 2025 • 0 new comments
[ROCm] Embeded device lib
#26704 commented on Jul 15, 2025 • 0 new comments
[XLA] Add stack trace breakdown to `HloLiveRange::ToString` for peak memory usage
#27542 commented on Jul 18, 2025 • 0 new comments