Skip to content

[FEAT] Add if/else branch condition tracking to sanitizer#336

Open
mark14wu wants to merge 1 commit intomainfrom
feat/if-else-branch-tracking
Open

[FEAT] Add if/else branch condition tracking to sanitizer#336
mark14wu wants to merge 1 commit intomainfrom
feat/if-else-branch-tracking

Conversation

@mark14wu
Copy link
Collaborator

Summary

  • Eliminate false OOB alarms for tl.load/tl.store inside guarded if branches within loops (e.g. if t > 0: tl.load(ptr + t - 1) no longer reports OOB when t=0)
  • Track branch conditions symbolically via AST rewriting and merge path constraints into Z3 checks, mirroring the existing for-loop patching architecture
  • Support both deferred checks (inside loops) and immediate checks (outside loops) with branch constraints, with correct signature deduplication across different branches

Test plan

  • Existing tests pass: uv run pytest tests/ -x -q --ignore=tests/nki (223 passed, 2 skipped)
  • New unit test: AST transformer produces correct try/finally structure
  • New e2e test: false positive elimination — if t > 0: tl.load(ptr + t - 1) with scalar tensor reports 0 OOB
  • New e2e test: distinct signatures — same addr under different branches are not deduped, real OOB still detected
  • New e2e test: PID-guarded immediate check — if pid == 0: tl.load(ptr + pid) with grid=2 reports 0 OOB

Known limitations

  1. Compound boolean expressions (and/or/not/chained comparisons) lose symbolic info because Python evaluates __bool__ before pre_if receives the value
  2. Data-dependent conditions (e.g. if tl.load(x_ptr) > 0:) fall back to plain Python truthiness (current behavior)

Eliminate false OOB alarms for memory accesses inside guarded branches
within loops. The sanitizer now tracks branch conditions symbolically
and merges them into Z3 constraints for deferred and immediate checks.

Architecture mirrors the existing for-loop patching pattern:
- AST transformer patches visit_If on the Triton interpreter
- IfFrame tracks predicate, support constraints, and concrete result
- Branch constraints are merged into PendingCheck.constraints
- Concrete materialization substitutes loop idx and PID vars

Known limitations:
- Compound boolean expressions (and/or/not) lose symbolic info
- Data-dependent conditions (tl.load results) fall back to defaults
@github-actions
Copy link

Sanitizer Performance Benchmark

Benchmark main (min) PR (min) Change
gemm 0.165s 0.165s -0.0%
gemm_oob 0.173s 0.172s -0.8%
indirect_load 0.254s 0.254s -0.2%
nested_loop 0.332s 0.332s -0.0%
block_pointer_loop_advance 0.160s 0.159s -0.5%
liger_jsd 0.138s 0.142s +2.5%
flaggems_layernorm 0.414s 0.417s +0.7%
swiglu 0.170s 0.170s +0.2%
cross_entropy 0.160s 0.162s +1.3%
fused_linear_jsd 0.208s 0.213s +2.6%
Total 2.174s 2.185s +0.5%

Iterations: 1 warmup + 20 measured

@Jokeren
Copy link
Member

Jokeren commented Mar 16, 2026

Eliminate false OOB alarms for tl.load/tl.store inside guarded if branches within loops (e.g. if t > 0: tl.load(ptr + t - 1) no longer reports OOB when t=0)

When do we report oob previously? I think there must be something wrong with the caching mechanism

@mark14wu
Copy link
Collaborator Author

mark14wu commented Mar 17, 2026

Eliminate false OOB alarms for tl.load/tl.store inside guarded if branches within loops (e.g. if t > 0: tl.load(ptr + t - 1) no longer reports OOB when t=0)

When do we report oob previously? I think there must be something wrong with the caching mechanism

I think there are actually two separate issues here.
For the false-positive example in the PR description, the previous OOB is caused primarily by missing branch/path constraints, not by caching alone. On main, deferred loop checks use the constraints from expr.eval() plus loop-iterator constraints, but they do not include the if predicate itself. So in a pattern like for t in ...: if t > 0: tl.load(ptr + t - 1), the solver can still pick t = 0 at check time and report an OOB that is infeasible on the executed path.

Your caching point is still relevant for a different case, though: before this PR, the loop signature is built from (addr_expr, constraints) without branch constraints, so the same address expression under different branches can collapse to one signature. This PR fixes that too by merging branch path constraints before computing the signature, and there is a regression test for “same addr expr under different branches should not dedup”. So I’d describe the root cause of the quoted false positive as path-insensitive branch handling, while cache/dedup is a secondary branch-related issue that this PR also addresses.

@Jokeren
Copy link
Member

Jokeren commented Mar 17, 2026

I think it's a more general problem even outside of loops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants