Skip to content

WIP - SWATCH-3545 and SWATCH-3809: Add intra-batch conflict bug reproducers and debug tools #4796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

lindseyburnett
Copy link
Collaborator

  • test_conflict_resolution_reproducer.py: Python script to reproduce the stuck event deduction cascade
  • test_intra_batch_conflict.py: Simple reproducer that reliably triggers the bug
  • debug_intra_batch_conflict.py: Interactive debug script with multiple scenarios for manual debugging
  • debug_setup_guide.md: Complete guide for setting up multiple replicas and debugging

These tools help reproduce and debug the REQUIRES_NEW transaction isolation bug
that causes cascading deductions instead of proper conflict resolution.

Related to SWATCH-3545 and SWATCH-3809

Description

Testing

IQE Test MR:

Setup

Steps

Verification

- test_conflict_resolution_reproducer.py: Python script to reproduce the stuck event deduction cascade
- test_intra_batch_conflict.py: Simple reproducer that reliably triggers the bug
- debug_intra_batch_conflict.py: Interactive debug script with multiple scenarios for manual debugging
- debug_setup_guide.md: Complete guide for setting up multiple replicas and debugging

These tools help reproduce and debug the REQUIRES_NEW transaction isolation bug
that causes cascading deductions instead of proper conflict resolution.

Related to SWATCH-3545 and SWATCH-3809
…saction

- Remove transactionHandler.runInNewTransaction() from persistServiceInstances()
- Ensure conflict resolution happens in the same transaction as event saving
- Fix the bug where conflict resolver couldn't see uncommitted events in same batch
- This prevents the cascade of deductions: [1.0, -1.0, 2.0, -2.0, 3.0]
- Now produces correct behavior: [1.0, -1.0, 3.0]

The bug was caused by REQUIRES_NEW transaction isolation preventing the
conflict resolver from seeing events that were being saved in the same batch.
By removing the separate transaction, all events in a batch are now visible
to the conflict resolver, ensuring proper conflict resolution.

Fixes SWATCH-3545 and SWATCH-3809
- test_kafka_intra_batch_conflict.py: Test that sends events through Kafka consumer path
- This tests the EventController.persistServiceInstances() method we fixed
- Helps verify that the fix works in the actual Kafka processing path

The test shows that the current endpoint goes through InternalTallyDataController
instead of the Kafka consumer path, which explains why our fix wasn't immediately
visible in testing.
- FIX_SUMMARY.md: Complete documentation of the bug, fix, and testing instructions
- Explains the root cause: REQUIRES_NEW transaction isolation in persistServiceInstances()
- Documents the fix: removing transactionHandler.runInNewTransaction()
- Provides testing instructions for both direct API and Kafka consumer paths
- Includes code path diagrams and expected results
- Explains why the current test may still show the bug (wrong code path)

This document serves as a complete guide for understanding, implementing, and
verifying the fix for the intra-batch conflict resolution bug.
@lindseyburnett lindseyburnett added the work in progress WIP, don't review yet. label Jul 29, 2025
Copy link

⛏️ Workflow Run

🧹 Checkstyle

🧪 JUnit

Details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
work in progress WIP, don't review yet.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant