You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please ignore if something like this already exists, but as someone who doesn't know XLA well, this helped me a lot when debugging #24186
When invalid HLO causes a INVALID_ARGUMENT: during context [hlo verifier], it is hard to figure out the context of the error message as all output is aborted.
$ hlo-opt --platform=cpu --stage=hlo-backend --xla_dump_to=xla_dump demo.txt
INVALID_ARGUMENT: during context [hlo verifier]: Binary op subtract with different element types: s64[] and s32[].
, for instruction %output = s32[] subtract(%Arg, %constant)
Failed after pipeline-start
$ ls xla_dump
/bin/ls: cannot access 'xla_dump': No such file or directory
In this case it is obvious where %Arg and %constant come from, but if these variables are generated during other HLO passes then it becomes impossible to see their definition (or at least I didn't figure out a way).
Because of that I just quickly commented out the hlo verifier calls
Diff
diff --git a/xla/service/cpu/cpu_compiler.cc b/xla/service/cpu/cpu_compiler.cc
index d5e1f6c1b5..56c1b2f716 100644
--- a/xla/service/cpu/cpu_compiler.cc+++ b/xla/service/cpu/cpu_compiler.cc@@ -1485,10 +1485,10 @@ CpuCompiler::CompileCpuExecutable(std::unique_ptr<HloModule> module) {
ir_module_string = absl::StrCat(emitter2_ir, "\n", thunks_ir);
}
- TF_RETURN_IF_ERROR(VerifyLlvmModule(*llvm_module));- for (const auto& [name, module] : thunk_emitter.kernels()) {- TF_RETURN_IF_ERROR(VerifyLlvmModule(*module.getModuleUnlocked()));- }+ // TF_RETURN_IF_ERROR(VerifyLlvmModule(*llvm_module));+ // for (const auto& [name, module] : thunk_emitter.kernels()) {+ // TF_RETURN_IF_ERROR(VerifyLlvmModule(*module.getModuleUnlocked()));+ // }
// Some kernels have to be compiled separately because they have
// extra backend options.
@@ -1724,7 +1724,7 @@ CpuCompiler::CompileCpuExecutable(std::unique_ptr<HloModule> module) {
ir_module_string = llvm_ir::DumpToString(llvm_module.get());
}
- TF_RETURN_IF_ERROR(VerifyLlvmModule(*llvm_module));+ // TF_RETURN_IF_ERROR(VerifyLlvmModule(*llvm_module));
// Save entry function name before destroying LLVM module.
std::string entry_function_name = entry_function->getName().str();
@@ -2018,11 +2018,11 @@ CpuCompiler::CompileAheadOfTimeLegacy(
// Run the LLVM verifier over the unoptimized LLVM IR. If it fails, run
// the pre-optimization IR dump hook before returning.
{
- absl::Status verify_status = VerifyLlvmModule(*llvm_module);- if (!verify_status.ok() && pre_optimization_ir_hook) {- pre_optimization_ir_hook(*llvm_module);- }- TF_RETURN_IF_ERROR(verify_status);+ // absl::Status verify_status = VerifyLlvmModule(*llvm_module);+ // if (!verify_status.ok() && pre_optimization_ir_hook) {+ // pre_optimization_ir_hook(*llvm_module);+ // }+ // TF_RETURN_IF_ERROR(verify_status);
}
auto post_codegen_hook = [&](const llvm::Module& llvm_module,
@@ -2183,10 +2183,10 @@ CpuCompiler::CompileAheadOfTimeThunks(
ir_module_string = absl::StrCat(emitter2_ir, "\n", thunks_ir);
}
- TF_RETURN_IF_ERROR(VerifyLlvmModule(*llvm_module));- for (const auto& [name, module] : thunk_emitter.kernels()) {- TF_RETURN_IF_ERROR(VerifyLlvmModule(*module.getModuleUnlocked()));- }+ // TF_RETURN_IF_ERROR(VerifyLlvmModule(*llvm_module));+ // for (const auto& [name, module] : thunk_emitter.kernels()) {+ // TF_RETURN_IF_ERROR(VerifyLlvmModule(*module.getModuleUnlocked()));+ // }
// Compilation part
ModuleHook pre_optimization_ir_hook;
diff --git a/xla/service/hlo_verifier.cc b/xla/service/hlo_verifier.cc
index ed78f602bb..6cfac265cf 100644
--- a/xla/service/hlo_verifier.cc+++ b/xla/service/hlo_verifier.cc@@ -3111,7 +3111,7 @@ absl::StatusOr<bool> HloVerifier::Run(
InstructionVerifier instruction_verifier(
module, target_metadata_->GetVerifierOpts());
for (auto* computation : module->computations(execution_threads)) {
- TF_RETURN_IF_ERROR(computation->Accept(shape_verifier.get()));+ // TF_RETURN_IF_ERROR(computation->Accept(shape_verifier.get()));
TF_RETURN_IF_ERROR(computation->Accept(&instruction_verifier));
// Verify that async computations contain a single instruction or a
// collection of send/recv instructions. This is needed to represent NCCL
$ hlo-opt --platform=cpu --stage=hlo-backend --xla_dump_to=xla_dump demo.txt
2025-04-02 12:00:06.554594: I xla/service/dump.cc:569] HloModule dump enabled with path prefix: , suffix: cpu_after_optimizations
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1743588006.556637 29590 execution_engine.cc:53] LLVM compilation error: Invalid argument Invalid LLVM IR before optimizations:
Both operands to a binary operator are not of the same type!
%4 = add i64 %2, i32 %3
INTERNAL: Failed to materialize symbols: { (<xla_jit_dylib_0>, { add_kernel }) }
$ ls xla_dump
module_0000.jit_f.add_elemental_kernel_module.ir-no-opt.ll
module_0000.jit_f.cpu_after_optimizations-buffer-assignment.txt
module_0000.jit_f.cpu_after_optimizations-memory-usage-report.txt
module_0000.jit_f.cpu_after_optimizations.config.pbtxt
module_0000.jit_f.cpu_after_optimizations.txt
This way I can see both the HLO after optimizations and the ir-no-opt.ll.
Would it be useful to add a command line argument that disables the HLO verifier?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Please ignore if something like this already exists, but as someone who doesn't know XLA well, this helped me a lot when debugging #24186
When invalid HLO causes a
INVALID_ARGUMENT: during context [hlo verifier], it is hard to figure out the context of the error message as all output is aborted.demo.txt:In this case it is obvious where
%Argand%constantcome from, but if these variables are generated during other HLO passes then it becomes impossible to see their definition (or at least I didn't figure out a way).Because of that I just quickly commented out the hlo verifier calls
Diff
This way I can see both the HLO after optimizations and the
ir-no-opt.ll.Would it be useful to add a command line argument that disables the HLO verifier?
Beta Was this translation helpful? Give feedback.
All reactions