Skip to content

Investigate CI jit crash#2159

Closed
pguyot wants to merge 7 commits intoatomvm:mainfrom
pguyot:claude/debug-ci-crash-TM1gK
Closed

Investigate CI jit crash#2159
pguyot wants to merge 7 commits intoatomvm:mainfrom
pguyot:claude/debug-ci-crash-TM1gK

Conversation

@pguyot
Copy link
Collaborator

@pguyot pguyot commented Mar 5, 2026

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

pguyot and others added 7 commits March 4, 2026 19:22
Running jit tests with AtomVM is now 20% faster.

Implement BEAM's `fullsweep_after` `spawn_opt/1` option and `process_flag/2`
flag. Also fix `process_flag/2` spec.

Signed-off-by: Paul Guyot <[email protected]>
Add build-helper.sh script to automate building and testing for the
jit-edge cherry-pick workflow. Also add build-nojit/, build-jit/, and
erl_crash.dump to .gitignore.

https://claude.ai/code/session_0199k5uGPhUJuGRXa4EibBmx
The test_min_heap_size test was using a 500ms timeout to receive a
DOWN message from a spawned process. Under valgrind (which slows
execution 10-50x), this timeout was insufficient, causing sporadic
failures with {badmatch,timeout}. Increase to 5000ms to match the
timeout used by similar tests (e.g., test_monitor).

https://claude.ai/code/session_01Q5dqnQiZib3Xwxe8TAEUwA
resource_type_fire_monitor and refc_binary_decrement_refcount raced on
resource lifetime: the former increments ref_count to keep the resource
alive during the down callback, but the latter may free it first.

Fix this by packing a monitor reference count and a dying flag into the
existing ref_count word (zero struct growth). For resources, the layout
is [dying:1 | monitor_refc:7/15 | ref_count:24/48] (32/64-bit). Plain
refc binaries use the full word for ref_count.

When ref_count reaches 0, destroy_resource_monitors sets the dying flag
and cancels pending monitors. If in-flight down callbacks remain
(monitor_refc > 0), destruction is deferred until the last fire_monitor
completes. The dying flag prevents calling down on a dying resource.

enif_monitor_process returns -1 when the per-resource monitor limit is
reached (127 on 32-bit, 32767 on 64-bit); the check is performed under
the monitors write-lock to prevent concurrent callers from jointly
overflowing monitor_refc.

All ref_count RMW operations go through explicit helpers
(refc_binary_add/sub/or_refcount) that dispatch to:
  1. C11 atomic_fetch_* when HAVE_ATOMIC is set;
  2. smp_atomic_fetch_*_size() in platform_atomic.h (new) on RP2040,
     which hold the existing atomic_cas_section critical section across
     the read-modify-write to cover the dual-core M0+ that has no native
     lock-free atomics;
  3. plain C operators as a single-core / no-SMP fallback.

The three-step resource destroy sequence (resource_unmark_serialized +
synclist_remove + refc_binary_destroy) is extracted into
refc_binary_free_resource() to avoid duplication between
refc_binary_decrement_refcount and resource_type_fire_monitor.

refc_binary_get_refcount() is introduced to uniformly extract the
user-visible ref count from the packed word, replacing the
open-coded ternary at four call sites.

https://claude.ai/code/session_01MMGHtEctkQS3UkVNwXVjY2
Also fix allocation size for `enif_make_resource_binary`

Signed-off-by: Paul Guyot <[email protected]>
The query also checks redundant ensure_free calls, i.e. calls
followed by another call with no allocation in between.

Fix errors found by the query:
- Fix an insufficient ensure_free in `enif_make_resource_binary`
- Added a missing ensure_free in esp32 `dac_driver.c`
- Remove nine redundant ensure_free calls followed by `enif_make_resource`
  in `otp_ssl.c` and `otp_socket.c` and esp32 drivers
- Remove a redundant ensure_free call in `nif_erlang_fun_to_list`

Signed-off-by: Paul Guyot <[email protected]>
Fix several cases where this happened in nifs. Also add a NOLINT comment
for cases where the query is not smart enough to remove the couple of
false positives.

Signed-off-by: Paul Guyot <[email protected]>
@pguyot pguyot closed this Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants