You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**The bug**: When `db.close()` calls `DeleteAllSessions()`, it directly sets `session->session_ = nullptr`, causing `Session::Delete()` to return early and never call `database_ref_.Reset()`.
66
+
67
+
**How it manifested**: Reference leak, potential issues during environment teardown.
68
+
69
+
**Fix**: Call `database_ref_.Reset()` in `DeleteAllSessions()` after cleaning up each session.
70
+
71
+
**Commit**: `fb283df`
39
72
40
-
**Evidence found**:
73
+
### Bug 2c: Mutex Deadlock Causing SIGSEGV
41
74
42
-
- Upstream Node.js tracks backups via `AddBackup()`, `RemoveBackup()`, `FinalizeBackups()`
43
-
- Our implementation was **missing all backup tracking**
44
-
- See: `src/upstream/node_sqlite.cc:685-791`
75
+
**The bug**: `DeleteAllSessions()` held `sessions_mutex_` while calling `database_ref_.Reset()`. Reset can trigger GC, which finalizes other Session objects, which call `Delete()` → `RemoveSession()` → tries to lock already-held mutex → **undefined behavior**.
1.**"musl/glibc incompatibility"** - Previous engineer suspected this, but extension loading works fine on Alpine. The real issue was the race condition.
153
+
### Pattern: Preventing GC of Parent Objects
103
154
104
-
2.**Trying to reproduce with prebuilds** - Spent time on Task 1 (downloading CI prebuilds), but the bug reproduced even with source builds once we understood the timing.
155
+
When a child object (Session, Statement) holds a pointer to a parent (DatabaseSync), you **must** also hold a reference to prevent GC:
105
156
106
-
3.**Looking for weak_ptr issues** - Searched for `weak_ptr` patterns but found none. The codebase uses raw pointers.
1.**Compare with upstream** - The Node.js source (`src/upstream/node_sqlite.cc`) shows proper patterns. Our implementation was missing backup tracking that upstream has.
166
+
**Why both?** The ObjectReference holds the parent alive, but calling methods via `database_ref_.Value()` on every access is expensive. Keep the raw pointer for performance.
167
+
168
+
### Pattern: Mutex and GC Don't Mix
169
+
170
+
**Never** hold a mutex while calling code that can trigger GC:
171
+
172
+
```cpp
173
+
// BAD: Reset() can trigger GC, which may try to acquire same mutex
// GOOD: Release mutex before operations that can trigger GC
180
+
std::set<Object*> copy;
181
+
{
182
+
std::lock_guard<std::mutex> lock(mutex_);
183
+
copy = objects_;
184
+
objects_.clear(); // Makes RemoveObject() a no-op
185
+
}
186
+
// Now safe - no mutex held
187
+
for (auto* obj : copy) {
188
+
obj->ref_.Reset();
189
+
}
190
+
```
111
191
112
-
2.**Race conditions in AsyncProgressWorker** - The worker thread can outlive the main-thread objects. Any data accessed from `Execute()` must either be:
113
-
- Copied at construction time, OR
114
-
- Protected by tracking/synchronization
192
+
### Why Only Alpine/musl?
115
193
116
-
3.**Mutex ordering matters** - `FinalizeBackups()` must release the lock before calling `Cleanup()` to avoid deadlock when destructor calls `RemoveBackup()`.
194
+
1. **Different GC timing**: musl's allocator has different behavior
1. Check which test file crashes (may shift around due to Jest worker assignment)
277
+
2. Look for pattern: Does crash always involve Session, Statement, or Backup?
278
+
3. Check for other `ObjectReference` cleanup paths: `grep -n "\.Reset()" src/*.cpp`
144
279
145
280
**Completion checklist**:
146
281
147
282
-[ ] Push changes
148
-
-[ ] 10 CI runs complete
149
-
-[ ] No SIGSEGV/SIGTRAP crashes
283
+
-[ ] test-alpine jobs pass for all Node versions (20, 22, 23, 24)
284
+
-[ ] test-alpine jobs pass for both architectures (x64, arm64)
285
+
-[ ] No SIGSEGV/SIGTRAP crashes in 5+ consecutive runs
150
286
-[ ] Move TPP to `doc/done/`
151
287
152
-
## Notes
288
+
---
289
+
290
+
## Commits Summary
153
291
154
-
The fix is complete and tested locally. The only remaining step is CI validation to confirm the flaky crashes are resolved in the actual CI environment where they occurred.
0 commit comments