-
-
Notifications
You must be signed in to change notification settings - Fork 461
fix(replay): Prevent ANR by caching connection status instead of blocking calls #4891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(replay): Prevent ANR by caching connection status instead of blocking calls #4891
Conversation
…king calls Use a cached connection status value that is updated via the onConnectionStatusChanged() callback instead of making blocking getConnectionStatus() calls in time-sensitive code paths. This prevents ANR issues in AndroidContinuousProfiler and ReplayIntegration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
| private var lastKnownConnectionStatus: ConnectionStatus = ConnectionStatus.UNKNOWN | ||
| private var debugMaskingEnabled: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: lastKnownConnectionStatus is a mutable field accessed by multiple threads without proper synchronization, leading to a data race and potential stale reads.
Severity: HIGH | Confidence: 1.00
🔍 Detailed Analysis
The lastKnownConnectionStatus field is subject to a data race. It is a mutable field not declared as volatile and is accessed by multiple threads without consistent synchronization. Writes in onConnectionStatusChanged() occur without a lock, while reads in resumeInternal() occur within lifecycleLock. This inconsistent synchronization means writes may not be visible to other threads, leading to stale reads and unreliable connection status, which can cause the replay integration to fail pausing or resuming correctly.
💡 Suggested Fix
Declare lastKnownConnectionStatus as volatile or ensure both read and write operations are protected by the same lock (e.g., lifecycleLock).
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location:
sentry-android-replay/src/main/java/io/sentry/android/replay/ReplayIntegration.kt#L98-L99
Potential issue: The `lastKnownConnectionStatus` field is subject to a data race. It is
a mutable field not declared as `volatile` and is accessed by multiple threads without
consistent synchronization. Writes in `onConnectionStatusChanged()` occur without a
lock, while reads in `resumeInternal()` occur within `lifecycleLock`. This inconsistent
synchronization means writes may not be visible to other threads, leading to stale reads
and unreliable connection status, which can cause the replay integration to fail pausing
or resuming correctly.
Did we get this right? 👍 / 👎 to inform future reviews.
| this.gestureRecorderProvider = gestureRecorderProvider | ||
| } | ||
|
|
||
| private var lastKnownConnectionStatus: ConnectionStatus = ConnectionStatus.UNKNOWN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Offline Replay: Connection Status Misinterpretation
lastKnownConnectionStatus initializes to UNKNOWN but resumeInternal() only blocks when the status equals DISCONNECTED. If resumeInternal() runs before the first onConnectionStatusChanged() callback fires, the replay will resume even when the device is actually offline, potentially flooding the cache with unsendable events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, in worst case the disconnected change will be fired via onConnectionStatusChanged() shortly afterwards anyway.
sentry-android-core/src/main/java/io/sentry/android/core/AndroidContinuousProfiler.java
Show resolved
Hide resolved
Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 806307f | 357.85 ms | 424.64 ms | 66.79 ms |
| ee747ae | 554.98 ms | 611.50 ms | 56.52 ms |
| 17a0955 | 372.53 ms | 446.70 ms | 74.17 ms |
| 2124a46 | 319.19 ms | 415.04 ms | 95.85 ms |
| 9fbb112 | 404.51 ms | 475.65 ms | 71.14 ms |
| b6702b0 | 395.86 ms | 409.98 ms | 14.12 ms |
| ee747ae | 357.79 ms | 421.84 ms | 64.05 ms |
| 27d7cf8 | 314.17 ms | 347.00 ms | 32.83 ms |
| 9fbb112 | 361.43 ms | 427.57 ms | 66.14 ms |
| 96449e8 | 361.30 ms | 423.39 ms | 62.09 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 806307f | 1.58 MiB | 2.10 MiB | 533.42 KiB |
| ee747ae | 1.58 MiB | 2.10 MiB | 530.95 KiB |
| 17a0955 | 1.58 MiB | 2.10 MiB | 533.20 KiB |
| 2124a46 | 1.58 MiB | 2.12 MiB | 551.51 KiB |
| 9fbb112 | 1.58 MiB | 2.11 MiB | 539.18 KiB |
| b6702b0 | 1.58 MiB | 2.12 MiB | 551.79 KiB |
| ee747ae | 1.58 MiB | 2.10 MiB | 530.95 KiB |
| 27d7cf8 | 1.58 MiB | 2.12 MiB | 549.42 KiB |
| 9fbb112 | 1.58 MiB | 2.11 MiB | 539.18 KiB |
| 96449e8 | 1.58 MiB | 2.11 MiB | 539.35 KiB |
| replay.start() | ||
| replay.onScreenshotRecorded(mock<Bitmap>()) | ||
|
|
||
| replay.onConnectionStatusChanged(DISCONNECTED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Offline Screenshot Pause: Test Logic Flawed.
The test calls onConnectionStatusChanged(DISCONNECTED) after onScreenshotRecorded(), so it verifies that onConnectionStatusChanged pauses the recorder rather than verifying that checkCanRecord() within onScreenshotRecorded detects offline status and pauses. The test passes for the wrong reason and doesn't actually test the intended behavior of pausing when a screenshot is recorded while offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even reversing the order here wouldn't help, as at the time checkCanRecord() is called, we already would be in the paused state.
romtsn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Replace blocking getConnectionStatus() calls for Session Replay with using cached connection status in ReplayIntegration to prevent ANR issues.