-
Notifications
You must be signed in to change notification settings - Fork 600
Insights: openxla/xla
Overview
Could not load contribution data
Please try again later
175 Pull requests merged by 1 person
-
Internal, visibility only changes to public code.
#29075 merged
Jul 19, 2025 -
Add visibility to hlo_input_output_format
#29076 merged
Jul 19, 2025 -
Reduce redundancy between StringTo* enum functions.
#29070 merged
Jul 19, 2025 -
[XLA:CPU] Refactor Intrinsic and use it in all math intrinsics.
#28940 merged
Jul 19, 2025 -
Integrate LLVM at llvm/llvm-project@06ae0c2a1086
#29043 merged
Jul 19, 2025 -
Update
nccl_archive
BUILD file to fix TF GPU wheel build.#29074 merged
Jul 19, 2025 -
[XLA:GPU] Add a verifier to the GPU compiler before post-scheduling pipeline.
#29040 merged
Jul 19, 2025 -
Use host callback in the CopyToHostFuture method in Async PjRt.
#29072 merged
Jul 18, 2025 -
Add function
ExtractDynamicSliceFromCollectiveUser
to extract a dynamic slice user from a collective.#28813 merged
Jul 18, 2025 -
Reverts 6319f0d3bdfd3078e04bb984a759c890b7116484
#29064 merged
Jul 18, 2025 -
Typo fix "perferred" -> "preferred".
#29068 merged
Jul 18, 2025 -
PR #28257: [XLA:GPU] Update ONEAPI crosstool compiler wrapper
#29039 merged
Jul 18, 2025 -
Use ASSERT_THAT to check pass.Run() result
#29048 merged
Jul 18, 2025 -
Annotate some XLA:GPU flags as stable i.e. they should provide 6 month deprecation notice.
#29028 merged
Jul 18, 2025 -
[XLA:GPU] Add a test for DotForInt4vsIdentityBF16ReturnsCorrectResult.
#28986 merged
Jul 18, 2025 -
PR #28985: [XLA:GPU] Add shared_memory_per_block_optin device info member
#29033 merged
Jul 18, 2025 -
[XLA:GPU] Move Dot strength reduction out of algebraic simplifier
#29049 merged
Jul 18, 2025 -
[XLA:GPU] Remove CHECK-CSE since it is not used.
#29025 merged
Jul 18, 2025 -
#sdy improve the error messaging when importing and exporting sharding custom calls.
#28967 merged
Jul 18, 2025 -
Introduce stable flags and associated deprecation policy for XLA debug options.
#28974 merged
Jul 18, 2025 -
Use GetInPlaceInputOutputPairs from AliasInfo instead of HloDataflowAnalysis.
#29051 merged
Jul 18, 2025 -
Remove ifdef from ir_emitter_unnested and fix various clang-tidy warnings
#29023 merged
Jul 18, 2025 -
Add TmaMetadata serialization support
#29011 merged
Jul 18, 2025 -
Automated Code Change
#29015 merged
Jul 18, 2025 -
Move GetInPlaceInputOutputPairs and related code to AliasInfo class (NFC).
#29019 merged
Jul 18, 2025 -
Automated Code Change
#29021 merged
Jul 18, 2025 -
Remove leftover logging
#29036 merged
Jul 18, 2025 -
Update PjRtCpuExecutable to not rely on any internals of PjRtCpuBuffer.
#29037 merged
Jul 18, 2025 -
[XLA][host offloading] Return AsyncValue from HostOffloadingExecutable.
#28874 merged
Jul 17, 2025 -
#sdy update dump names and add index as prefix so they would be clearer for users
#29018 merged
Jul 17, 2025 -
[Autotuner] Add block level emitter backend for Triton fusion (3).
#28810 merged
Jul 17, 2025 -
[IFRT] Add
UserContextScope
#28949 merged
Jul 17, 2025 -
Add ReleaseDeviceMemoryOwnership implementation based on
#29034 merged
Jul 17, 2025 -
Migrate uses of
XLA_TEST_BACKEND
macros to use utilities inxla_test_backend_predicates.h
#29029 merged
Jul 17, 2025 -
Correctly identify async start and done ops in latency hiding scheduler.
#29005 merged
Jul 17, 2025 -
[NCCL] Upgrade TF NCCL version to 2.26.5
#26949 merged
Jul 17, 2025 -
[xla:cpu] Make DotLibraryRewriter support greedy fusion mode.
#28496 merged
Jul 17, 2025 -
Optimize
BM_GlobalDecreasingSizeBestFitHeap
benchmark by up to 3%.#28996 merged
Jul 17, 2025 -
Update CommonPjRtBufferImpl to have specialized versions for both cpu->device
#29002 merged
Jul 17, 2025 -
[Autotuner] Add block level emitter backend for Triton fusion (2).
#28808 merged
Jul 17, 2025 -
Use ASSERT_THAT(..., IsOkAndHolds(true)) for consistency and correctness
#28944 merged
Jul 17, 2025 -
Reverts e3c8dc729f1ac49d6a5a4e09f973ba40c185f6d9
#29008 merged
Jul 17, 2025 -
Simplify ShouldSkipForSideEffect function in zero_sized_hlo_elimination.
#29010 merged
Jul 17, 2025 -
[XLA:GPU] Remove unused
DotSparsityRewriter
.#29024 merged
Jul 17, 2025 -
Automated Code Change
#29020 merged
Jul 17, 2025 -
[XLA:GPU] additional logging in triton fusion numeric verifier
#28981 merged
Jul 17, 2025 -
[xla:gpu][triton]
triton-xla-squeeze-dims
pass improvements.#29009 merged
Jul 17, 2025 -
Automated Code Change
#28908 merged
Jul 17, 2025 -
PR #28073: [XLA:GPU][oneAPI] Enable Level_zero support
#28953 merged
Jul 17, 2025 -
Remove deprecated HloAliasAnalysis::Run method
#28968 merged
Jul 17, 2025 -
Add serialization and deserialization for the cuDNN thunk
#28872 merged
Jul 17, 2025 -
[xla] Optimize ShapeUtil::ForEach traverals
#28987 merged
Jul 17, 2025 -
[xla:tf] Check if device shape is already a host shape
#28951 merged
Jul 17, 2025 -
Rollback https://github.com/openxla/xla/commit/cf3dfa9723c4cd4e2b25a606207a201a95fe71db
#28990 merged
Jul 17, 2025 -
Support for nested while loops in while_loop_unroller.
#27791 merged
Jul 16, 2025 -
Move op name longest prefix logic from annotation.cc to somewhere upper level
#26865 merged
Jul 16, 2025 -
Migrate uses of
XLA_TEST_BACKEND
macros to use utilities inxla_test_backend_predicates.h
#28945 merged
Jul 16, 2025 -
[XLA] Refactoring Reduce Window Rewriter to reduce complexity
#28890 merged
Jul 16, 2025 -
[JAX]: rollforward. Add ability to add a transfer server factory to override
#28993 merged
Jul 16, 2025 -
[xla] Move xla::Shape functions that are used on a hot path to header file
#28982 merged
Jul 16, 2025 -
Reverts 198c17b8bfb03c893a19dc973d634b509aa69ede
#28988 merged
Jul 16, 2025 -
Complete the CommonPjRtBufferImpl implementation.
#28941 merged
Jul 16, 2025 -
#sdy Mark
xla.sdy.LocalToGlobalShape
custom call as side effecting so it isn't removed if unused.#28963 merged
Jul 16, 2025 -
Added
PjrtClient::UpdateGlobalProcessInfo
method.#28011 merged
Jul 16, 2025 -
PR #28877: [XLA]Clamp num_workers to avoid partition overflow
#28971 merged
Jul 16, 2025 -
[tf] Use non-owning ShapeTree to pass execution inputs to XLA
#28979 merged
Jul 16, 2025 -
[XLA] Be less aggressive about recursively updating metadata when inlining.
#28969 merged
Jul 16, 2025 -
[XLA:GPU] Move IsIntermediate & FindHero to shared ir_emission_utils.
#28976 merged
Jul 16, 2025 -
Move HloAliasAnalysis out of HloModuleGroupMetadata (NFC).
#28961 merged
Jul 16, 2025 -
Pass proper AliasInfo to HloAliasAnalysis::Run in tests (NFC).
#28965 merged
Jul 16, 2025 -
[XLA:GPU] Update documentation for triton_xla.extract/insert.
#28964 merged
Jul 16, 2025 -
[xla][gpu][triton] Temporarily disable triton squeeze dims pass, due to internal benchmark regression.
#28957 merged
Jul 16, 2025 -
Remove unused HloAliasAnalysis instance (NFC).
#28954 merged
Jul 16, 2025 -
Skip TreeReductionRewriter for Slinky.
#28914 merged
Jul 16, 2025 -
[XLA:GPU] update triton test for generic emitter
#28934 merged
Jul 16, 2025 -
[xla] Add benchmark for ShapeUtil::SubshapeCount
#28952 merged
Jul 16, 2025 -
Automated Code Change
#28913 merged
Jul 16, 2025 -
[xla] Change the order of std::variant types in MaybeOwningDeviceMemory
#28947 merged
Jul 16, 2025 -
The raw buffer CopyToMemorySpace don't seem to quite work yet cross client, so avoid
#28948 merged
Jul 16, 2025 -
[xla] Optimize constructing ShapeTree
#28946 merged
Jul 16, 2025 -
[JAX] Cache transfer server connections for cross-host device_put.
#28942 merged
Jul 15, 2025 -
Update target define states before we update ready list.
#28943 merged
Jul 15, 2025 -
Reverts 41367aa00d6e2843301b1bc793ac5090564a3ef1
#28937 merged
Jul 15, 2025 -
Create
xla::test::Empty
for instantiating empty test suites.#28938 merged
Jul 15, 2025 -
Add ::GetReadyFuturePromise to be used in implementing
#28904 merged
Jul 15, 2025 -
Add an option to do multiple executions of the same module to HloRunners.
#28776 merged
Jul 15, 2025 -
Pass proper AliasInfo to HloAliasAnalysis::Run (NFC).
#28926 merged
Jul 15, 2025 -
[XLA][Numerics][HLO Value Tracking] Add recovery modules when removing nested reshapes on TPU
#28611 merged
Jul 15, 2025 -
Add CopyToMemorySpace which calls DirectCopyToMemorySpace or
#28900 merged
Jul 15, 2025 -
#HLODiff Remove text diff summary
#28894 merged
Jul 15, 2025 -
#HLODiff Update print progress at the end of matcher to show 100%.
#28892 merged
Jul 15, 2025 -
[XLA:CPU] Don't expand tanh at the fusion level.
#28930 merged
Jul 15, 2025 -
[IFRT] Do not set MHLO shardings if sdy partitioned
#28865 merged
Jul 15, 2025 -
Handle GetDonatableInputIndices() errors
#28907 merged
Jul 15, 2025 -
[XLA:CPU] Disable fusion level vectorization.
#28927 merged
Jul 15, 2025 -
Add missing header.
#28848 merged
Jul 15, 2025 -
[XLA:CPU][XLA:GPU] Set default alignment of vector load/store as that of the vector element type.
#28925 merged
Jul 15, 2025 -
#sdy Clean up
AddAxisOrMergeInserter
in dedup_meshes#28447 merged
Jul 15, 2025 -
Update the link for hermetic CUDA documentation.
#28932 merged
Jul 15, 2025 -
[ifrt] Fix spelling in CopyArraysOp description.
#28933 merged
Jul 15, 2025 -
PR #28716: [GPU] Make fabric info test compatible with lower CUDA driver versions
#28801 merged
Jul 15, 2025 -
Remove MeshAttr builder that takes a single int
#28931 merged
Jul 15, 2025 -
#sdy Mark
xla.sdy.LocalToGlobalShape
custom call as side effecting so it isn't removed if unused.#28869 merged
Jul 15, 2025 -
[XLA:GPU] Implement tiling for dot.
#28725 merged
Jul 15, 2025 -
PR #28728: Add Nvidia benchmarks
#28882 merged
Jul 15, 2025 -
Make Thunk keep an instance of ThunkInfo directly (NFC)
#28871 merged
Jul 15, 2025 -
Remove workarounds for missing ABSL_DEPRECATE_AND_INLINE
#28924 merged
Jul 15, 2025 -
[XLA:CPU][XLA:GPU] Increase limit in number of iterations of UnswitchLoopsPass.
#28873 merged
Jul 15, 2025 -
[xla:cpu] Add DotLibraryRewriter rewrite options for oneDNN and XNNPACK.
#28923 merged
Jul 15, 2025 -
Automated Code Change
#28921 merged
Jul 15, 2025 -
[xla:cpu] Tiny improvements for documentation and function names
#28920 merged
Jul 15, 2025 -
Fix shardy_xla_pass_test that is failing
#28899 merged
Jul 15, 2025 -
[XLA:CPU][XLA:GPU] Fix missing layout on emitted constants.
#28804 merged
Jul 15, 2025 -
Automated Code Change
#28918 merged
Jul 15, 2025 -
Remove dependency on KernelArguments from CudnnThunk
#28870 merged
Jul 15, 2025 -
[XLA:GPU] Do not multi-output fuse sibling transposes with reductions.
#28786 merged
Jul 15, 2025 -
Migrate away from ArrayRef(std::nullopt_t)
#28897 merged
Jul 15, 2025 -
PR #28401: [ROCm] Fix PackedTranspose for adapting to warp size 64
#28916 merged
Jul 15, 2025 -
PR #25914: [NVIDIA GPU] Add nvshmem communicator and runtime thunks
#28863 merged
Jul 15, 2025 -
[XLA] Propagate
op_name
s recursively in theCallInliner
.#28887 merged
Jul 15, 2025 -
Fix test-case when NVML library is not available.
#28876 merged
Jul 15, 2025 -
[xla:cpu] Mark cpu_function_runtime alignment as deprecated
#28759 merged
Jul 15, 2025 -
initial implementation of send/recv static verification
#28620 merged
Jul 15, 2025 -
Remove unused ExecutionProfile option.
#28730 merged
Jul 15, 2025 -
Add
HloAsyncStartInstruction::AddCallOperand
to mirrorHloCallInstruction::AddCallOperand
.#28768 merged
Jul 15, 2025 -
[xla:codegen] Migrate Fptrunc to GetOrInsertDeclaration API
#28902 merged
Jul 15, 2025 -
[XLA] Refactoring Reduce Window Rewriter to reduce complexity
#28822 merged
Jul 15, 2025 -
Migrate away from ArrayRef(std::nullopt_t)
#28895 merged
Jul 15, 2025 -
[xla:codegen] Use Intrinsic::Type in Fptruc::CreateDefinition
#28881 merged
Jul 14, 2025 -
Add 'mode' attribute to AllReduce and ReduceScatter.
#28429 merged
Jul 14, 2025 -
[IFRT IR] Add IFRT IR program interpreter
#28891 merged
Jul 14, 2025 -
[Efficiency]Cleanup unused metrics which track the pjrt compilation status.
#28761 merged
Jul 14, 2025 -
[IFRT IR] Add pipeline for compiling IFRT IR programs
#28857 merged
Jul 14, 2025 -
[XLA:GPU] update determenism test to use generic triton emitter
#28880 merged
Jul 14, 2025 -
set layout assignment for the result correctly
#28559 merged
Jul 14, 2025 -
Add CloneWithControlDependency which is used to implement
#28815 merged
Jul 14, 2025 -
[XLA:GPU]: Enable two-shot all reduce implementation for usage.
#28594 merged
Jul 14, 2025 -
set default layout when exporting dense constants from HLO to MLIR
#28763 merged
Jul 14, 2025 -
Re-enable precompilation for some tests.
#28772 merged
Jul 14, 2025 -
[XLA:GPU] enable nested fusion for autotuner test
#28875 merged
Jul 14, 2025 -
Align
AtLocation
signature with AbseilLogMessage::AtLocation
.#28867 merged
Jul 14, 2025 -
[XLA] Use "edge time indices" to skip some redundant calls to FindChunkCandidate.
#28769 merged
Jul 14, 2025 -
[XLA:CPU] Move erf32 approximation to mathlib.
#28796 merged
Jul 14, 2025 -
[XLA:CPU] Add expm1 expansion.
#28795 merged
Jul 14, 2025 -
[XLA:GPU]: Calculate rank_offset and rotated_ranks outside the kernel.
#28232 merged
Jul 14, 2025 -
[XLA:CPU] Move passes from expand_float_ops that lower to math lib.
#28794 merged
Jul 14, 2025 -
[XLA:GPU]: Calculate launch dimensions based on input size.
#28186 merged
Jul 14, 2025 -
Pass proper AliasInfo to HloAliasAnalysis::Run in HostOffloader (NFC).
#28866 merged
Jul 14, 2025 -
[XLA:GPU] Print fusion string when selecting the best result, instead of root string.
#28800 merged
Jul 14, 2025 -
[xla][gpu][triton] Do not duplicate code in squeeze dims pass, re-enable the pass.
#28861 merged
Jul 14, 2025 -
Disable NVSHMEM send-recv test-case due to flakiness.
#28858 merged
Jul 14, 2025 -
PR #28295: [NVIDIA GPU] Do out of place allreduce for nvshmem
#28860 merged
Jul 14, 2025 -
[XLA:GPU] Remove code for horizontal_input_fusion.
#28562 merged
Jul 14, 2025 -
Update
StreamExecutorGpuClientTest.PropagateError
test to expect unpacked tuples#28864 merged
Jul 14, 2025 -
XLA:GPU: Fix method ambiguity on CUDA 12.4
#28847 merged
Jul 14, 2025 -
Avoid using PointsToAnalysis in DFSMemoryScheduler (NFC).
#28747 merged
Jul 14, 2025 -
Always stage transfers when doing d2h copy to avoid memory corruption issue.
#28828 merged
Jul 14, 2025 -
[xla:codegen] Use Intrinsic::Type in Fptruc::GetOrInsertDeclaration
#28841 merged
Jul 13, 2025 -
Reverts ff9ecfc192000b5a62c0adabfd968e5703b0229a
#28832 merged
Jul 13, 2025
78 Pull requests opened by 9 people
-
Migrate ListScheduler from TuplePointsToAnalysis to HloAliasAnalysis (NFC).
#28868 opened
Jul 14, 2025 -
[XLA:CPU][oneDNN] Add build flag to enable asynchronous support in oneDNN
#28883 opened
Jul 14, 2025 -
Automated Code Change
#28884 opened
Jul 14, 2025 -
Fix cost analysis on for output byte accessed when result is tuple
#28886 opened
Jul 14, 2025 -
Avoid crashing when LRU cache keys change.
#28888 opened
Jul 14, 2025 -
Automated Code Change
#28889 opened
Jul 14, 2025 -
test PR #28728: Add Nvidia benchmarks
#28896 opened
Jul 14, 2025 -
There is nothing in this change going to 3rd party.
#28903 opened
Jul 15, 2025 -
Integrate LLVM at llvm/llvm-project@06ae0c2a1086
#28906 opened
Jul 15, 2025 -
[NVIDIA GPU] [XLA_GPU_MS_COLLECTIVE] Round-robin stream assignment for async communications
#28919 opened
Jul 15, 2025 -
[xla:gpu][triton] Add squeeze_dims of tt.descriptor_load rewrite.
#28922 opened
Jul 15, 2025 -
Dump optimized HLO when deserializing
#28928 opened
Jul 15, 2025 -
Add subtraction pattern to reduce scatter creator
#28929 opened
Jul 15, 2025 -
clean device description for rocm
#28936 opened
Jul 15, 2025 -
Integrate LLVM at llvm/llvm-project@0d5325bb203f
#28939 opened
Jul 15, 2025 -
Update deps:
#28960 opened
Jul 16, 2025 -
Set KV store to null with mocked GPU processes
#28962 opened
Jul 16, 2025 -
PR #28735: [XLA:GPU] Enabling cuda graph concurrent mode by default
#28970 opened
Jul 16, 2025 -
[XLA:GPU] Move the s4 unpacking sequence from llvm pass to int4->int8 pass
#28972 opened
Jul 16, 2025 -
[XLA:CPU][XLA:GPU] Move concat fusion emitter to shared directory
#28975 opened
Jul 16, 2025 -
[XLA:GPU][host offloading] Implement gpu host offloading allocator.
#28977 opened
Jul 16, 2025 -
Allow the chaining of state across MetricHookInterface instantiations for multiple compilations.
#28978 opened
Jul 16, 2025 -
[XLA:GPU][host offloading] Implement host offloading thunks.
#28983 opened
Jul 16, 2025 -
[#HLODiff] Add support for manual node matching.
#28984 opened
Jul 16, 2025 -
Add a new precompilation-only marker to `Literal`s.
#28991 opened
Jul 16, 2025 -
Avoid recomputation of `pjrt_buffer->memory_space()` in `MakeMemoryKindFromPjRtBuffer`.
#28992 opened
Jul 16, 2025 -
No changes to 3rd party.
#28994 opened
Jul 16, 2025 -
[Perf] Add expensive AllGather cost adjustment to default GPU Scheduler
#28997 opened
Jul 16, 2025 -
Added WatchJobState RPC to coordination service.
#28998 opened
Jul 16, 2025 -
Cache device on `PJRT_Buffer`.
#29000 opened
Jul 17, 2025 -
[XLA:MSA] Add block allocations for program weights that are not aliased and single use.
#29001 opened
Jul 17, 2025 -
Remove `local_config_nvshmem` repository and corresponding macros.
#29003 opened
Jul 17, 2025 -
Integrate LLVM at llvm/llvm-project@2910c24638fc
#29004 opened
Jul 17, 2025 -
[XLA:CPU] Run CSE after inlining in fusion compiler.
#29016 opened
Jul 17, 2025 -
PR #28883: [XLA:CPU][oneDNN] Add build flag to enable asynchronous support in oneDNN
#29017 opened
Jul 17, 2025 -
Add 10 Maxtext-derived HLO-based benchmarks
#29027 opened
Jul 17, 2025 -
[XLA:GPU] Refactor tests of IndexingMap
#29031 opened
Jul 17, 2025 -
Integrate LLVM at llvm/llvm-project@06ae0c2a1086
#29032 opened
Jul 17, 2025 -
[XLA][Numerics][HLO Value Tracking] Track original values through propagation of shardy annotation
#29038 opened
Jul 17, 2025 -
Remove redundant string conversion.
#29041 opened
Jul 17, 2025 -
[IFRT] Define `user_context()` in `Value` and `LoadedExecutable`
#29042 opened
Jul 17, 2025 -
Optimize `HasCombinableReplicaGroup` and `xla::CheckReplicaGroups`.
#29044 opened
Jul 18, 2025 -
Change `PjRtClient::LazyToLiteral` to take a generator that returns a future of the literal
#29046 opened
Jul 18, 2025 -
Remove LLVM dependency from KernelThunk
#29057 opened
Jul 18, 2025 -
Remove HLO and Autotuner dependency from CublasLtMatmulThunk
#29058 opened
Jul 18, 2025 -
[XLA] Use sort instead of btree in MakeFreeChunks.
#29060 opened
Jul 18, 2025 -
pass in scheduling group id when adding some new ops from ops which have id.
#29061 opened
Jul 18, 2025 -
Automated Code Change
#29062 opened
Jul 18, 2025 -
Remove AbstractCpuBuffer. All subclasses can be replaced with CommonPjRtBufferImpl and removed.
#29066 opened
Jul 18, 2025 -
Add SparseCore documentation
#29069 opened
Jul 18, 2025 -
SPMD Partitioning for MX Block Scaled Dots
#29073 opened
Jul 18, 2025 -
Give better error in run_hlo_module if HLO has collectives.
#29077 opened
Jul 19, 2025 -
Determine collective support based on #partitions
#29078 opened
Jul 19, 2025 -
Automated Code Change
#29080 opened
Jul 19, 2025 -
[xla:gpu][triton] In squeeze-dims pass, keep at least two dimensions.
#29083 opened
Jul 19, 2025 -
Automated Code Change
#29084 opened
Jul 19, 2025 -
[XLA:GPU][Tiling] Use SmallVector<OneDimTile> to store tiling info.
#29085 opened
Jul 19, 2025 -
Avoid heap allocation for the sub buffer address
#29086 opened
Jul 19, 2025 -
Use stablehlo precision config conversion for stablehlo ops
#29088 opened
Jul 19, 2025 -
PR #28735: [XLA:GPU] Enabling cuda graph concurrent mode by default
#29089 opened
Jul 19, 2025 -
Adding extractor for metadata embedded in hlo expression for debugging.
#29090 opened
Jul 19, 2025 -
Automated Code Change
#29091 opened
Jul 20, 2025 -
Automated Code Change
#29092 opened
Jul 20, 2025 -
Automated Code Change
#29093 opened
Jul 20, 2025 -
Automated Code Change
#29094 opened
Jul 20, 2025 -
Automated Code Change
#29095 opened
Jul 20, 2025 -
Automated Code Change
#29096 opened
Jul 20, 2025 -
Automated Code Change
#29097 opened
Jul 20, 2025 -
Automated Code Change
#29098 opened
Jul 20, 2025 -
Automated Code Change
#29099 opened
Jul 20, 2025 -
Automated Code Change
#29100 opened
Jul 20, 2025 -
Automated Code Change
#29101 opened
Jul 20, 2025
3 Issues closed by 3 people
-
PR process improvements
#2038 closed
Jul 15, 2025 -
[TPU] Bug: Reverse is orders of magnitude slower on TPU
#23191 closed
Jul 15, 2025 -
PJRT_Memory vs PJRT_Buffer
#28846 closed
Jul 13, 2025
7 Issues opened by 7 people
-
dynamic_broadcast_in_dim MHLO -> XLA HLO conversion failed
#29030 opened
Jul 17, 2025 -
[Proposal] Extract the Profiling Subsystem into a Dedicated OpenXLA Repository
#29007 opened
Jul 17, 2025 -
Missing CUDNN 9.10.2 for json hermetic, will CUDA 12.9.1 also be missing?
#28989 opened
Jul 16, 2025 -
GPU mocking hangs the compiler (in autotuning)
#28959 opened
Jul 16, 2025 -
Build error while trying to build algorithm.cc
#28905 opened
Jul 15, 2025 -
Passing Kernels to FFI
#28893 opened
Jul 14, 2025 -
Why does tfrt use only one stream for gpu client?
#28859 opened
Jul 14, 2025
44 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[NVIDIA GPU] Skip user buffer reg when the size is 1
#28396 commented on
Jul 15, 2025 • 2 new comments -
[XLA:CPU][oneDNN] Implement oneDNN primitives for custom calls in Thunk runtime
#28615 commented on
Jul 16, 2025 • 1 new comment -
[ROCm] fixed broadcast_constant_block_dim_limit.hlo.test on rocm
#27816 commented on
Jul 15, 2025 • 0 new comments -
[XLA:GPU][oneAPI] Enable Clang compiler as the host compiler
#27904 commented on
Jul 17, 2025 • 0 new comments -
[XLA:GPU] Add support for intel gpu backend for tests
#27943 commented on
Jul 18, 2025 • 0 new comments -
Update Protobuf to 6.31.1
#28164 commented on
Jul 14, 2025 • 0 new comments -
[ROCm] Enable mx data type for ROCm
#28173 commented on
Jul 18, 2025 • 0 new comments -
Add metadata for CUDA and libtpu versions
#28195 commented on
Jul 15, 2025 • 0 new comments -
[XLA:SCHEDULING] Defer scheduling of allocate buffer custom calls
#28355 commented on
Jul 15, 2025 • 0 new comments -
[XLA] Remove dead argument in ProtoToHumanReadableJson
#28659 commented on
Jul 14, 2025 • 0 new comments -
PR #19067: [XLA:CPU][oneDNN] Move simplification pass before oneDNN pass
#28680 commented on
Jul 19, 2025 • 0 new comments -
#sdy Remove MHLO shardings from round-trip export
#28692 commented on
Jul 17, 2025 • 0 new comments -
Use RPG's solution as a hint to CP-SAT
#28723 commented on
Jul 17, 2025 • 0 new comments -
[XLA:benchmarks] Test Nvidia benchmarks from https://github.com/openxla/xla/pull/28728
#28727 commented on
Jul 15, 2025 • 0 new comments -
[XLA:GPU] Enabling cuda graph concurrent mode by default
#28735 commented on
Jul 15, 2025 • 0 new comments -
[XLA:GPU] Lowering dynamic update slice thunk into command buffer if it depends on loop iteration.
#28740 commented on
Jul 18, 2025 • 0 new comments -
[XLA:GPU] Add sycl_kernel component and test
#28762 commented on
Jul 17, 2025 • 0 new comments -
[XLA] Add stack trace breakdown to `HloLiveRange::ToString` for peak memory usage
#28777 commented on
Jul 13, 2025 • 0 new comments -
Add Hermetic C++ Toolchains for Linux x86_64 builds.
#28827 commented on
Jul 17, 2025 • 0 new comments -
Automated Code Change
#28836 commented on
Jul 14, 2025 • 0 new comments -
Automated Code Change
#28839 commented on
Jul 14, 2025 • 0 new comments -
Automated Code Change
#28843 commented on
Jul 14, 2025 • 0 new comments -
CMake build support
#1 commented on
Jul 14, 2025 • 0 new comments -
crosstool_wrapper_driver_is_not_gcc failed: error executing command
#3552 commented on
Jul 14, 2025 • 0 new comments -
Hermetic CUDA no longer respects TF_DOWNLOAD_CLANG
#16866 commented on
Jul 17, 2025 • 0 new comments -
Fix issues with ARM build after latest hermetic changes
#28256 commented on
Jul 15, 2025 • 0 new comments -
How can cross-architecture operator libraries be applied in JAX, such as cuBLAS?
#28265 commented on
Jul 15, 2025 • 0 new comments -
"Can't find libdevice directory" due to unset TF_CUDA_TOOLKIT_PATH
#28590 commented on
Jul 15, 2025 • 0 new comments -
Poor cuDNN kernel selection
#28665 commented on
Jul 14, 2025 • 0 new comments -
xla/service/gpu/kernels/cutlass_gemm_kernel_bf16xbf16_to_bf16.cu.cc trouble compiling cutlass
#28669 commented on
Jul 16, 2025 • 0 new comments -
Cross compile to ARM with custom gcc
#28807 commented on
Jul 15, 2025 • 0 new comments -
Possibility to specify strides when sending the data from buffer to host
#28833 commented on
Jul 15, 2025 • 0 new comments -
[WIP] [ROCm] Moving blas::CallContext into NumericOptions
#9593 commented on
Jul 15, 2025 • 0 new comments -
[ROCm] Add CudnnPadForConvolutions and CudnnVectorizeConvolutionsHLO pass to AMDGPU compiler
#10337 commented on
Jul 15, 2025 • 0 new comments -
[ROCm] Add CudnnNormRewriter pass to AMDGPU compiler
#10421 commented on
Jul 15, 2025 • 0 new comments -
Add missing unit test tags in `xla/service/gpu/BUILD`
#10620 commented on
Jul 15, 2025 • 0 new comments -
[ROCm] Disable fusion MLIR for ROCM
#13663 commented on
Jul 15, 2025 • 0 new comments -
[ROCm] Fix FP32 atomic_rmw
#14117 commented on
Jul 15, 2025 • 0 new comments -
[XLA:CPU][oneDNN] Move simplification pass before oneDNN pass
#19067 commented on
Jul 14, 2025 • 0 new comments -
Support the Execute Device Kernel feature in XLA
#26236 commented on
Jul 17, 2025 • 0 new comments -
[NVIDIA GPU] Dynamic SPMD iteration limit for larger fast-interconnect domain
#26391 commented on
Jul 15, 2025 • 0 new comments -
[NVIDIA GPU] Add support for nccl symmetric kernel
#26443 commented on
Jul 14, 2025 • 0 new comments -
[ROCm] Embeded device lib
#26704 commented on
Jul 15, 2025 • 0 new comments -
[XLA] Add stack trace breakdown to `HloLiveRange::ToString` for peak memory usage
#27542 commented on
Jul 18, 2025 • 0 new comments