Skip to content

Conversation

@SkySkimmer
Copy link
Contributor

@SkySkimmer SkySkimmer commented Oct 31, 2025

Typically the reductionops "Tried to normalize ill-typed term" hack deliberately catches anomalies (including those from CErrors.anomaly) but shouldn't catch async exceptions.

Hopefully fixes the non working Fail Timeout in bedrock2, we could also revert #21216 for our own test suite.

(debugging tip: apply

@@ -32,7 +32,14 @@ let check_for_interrupt () =
 (* This function assumes it is the only function calling [setitimer] *)
 let unix_timeout n f x =
   let open Unix in
-  let timeout_handler _ = raise Timeout in
+  let timeout_handler _ =
+    let info = Exninfo.reify () in
+    let () =
+      Exninfo.get_backtrace info |> Option.iter @@ fun bt ->
+      Printf.eprintf "timeout at %s\n%!" (Exninfo.backtrace_to_string bt)
+    in
+    raise Timeout
+  in
   let old_timer = getitimer ITIMER_REAL in
   (* Here we assume that the existing timer will also interrupt us. *)
   if old_timer.it_value > 0. && old_timer.it_value <= n then Ok (f x) else

then run a buggy Timeout with Set Debug "backtrace" on until you get the bug, the function which catches the timeout exn should be in the printed backtrace. For instance I had

Called from Reductionops.infer_conv_gen in file "pretyping/reductionops.ml", line 1324, characters 10-52

when trying the Fail Timeout in bedrock2 LeakageSemantics.)

Overlays:

@SkySkimmer SkySkimmer added this to the 9.2+rc1 milestone Oct 31, 2025
@SkySkimmer SkySkimmer requested review from a team as code owners October 31, 2025 13:51
@SkySkimmer SkySkimmer added the request: full CI Use this label when you want your next push to trigger a full CI. label Oct 31, 2025
@SkySkimmer SkySkimmer requested review from a team as code owners October 31, 2025 13:51
@coqbot-app coqbot-app bot removed the request: full CI Use this label when you want your next push to trigger a full CI. label Oct 31, 2025
@SkySkimmer
Copy link
Contributor Author

Note: Fail Timeout in particular became buggy because the Control.Timeout exn was reused for the error raised outside Timeout when timing out (what is now VernacControl.CmdTimeout), had a registered printer and so was not considered an anomaly by is_anomaly and was not caught by the reductionops hack. The code was already buggy for other async exceptions without registered printers though.

@SkySkimmer SkySkimmer added the request: full CI Use this label when you want your next push to trigger a full CI. label Oct 31, 2025
@coqbot-app coqbot-app bot removed the request: full CI Use this label when you want your next push to trigger a full CI. label Oct 31, 2025
with e when is_anomaly e
with e when is_anomaly ~async:false e
(* Hack in case equality is not yet defined... one day, maybe,
known equalities will be dynamically registered *)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this hack can be removed but probably better to leave it to another PR

let ist = { ist with lfun = lfun' } in
try
Changed (interp_constr ist env sigma c)
with e when to_catch e (* Hack *) ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure WTF is going on here, maybe this hack can be removed but probably better to leave it to another PR

@SkySkimmer
Copy link
Contributor Author

The pattern of getting an exit code by looking at is_anomaly seems pretty common, maybe worth defining a function CErrors.exit_code_of_exn instead of having every executable directly call is_anomaly?

@SkySkimmer SkySkimmer added the kind: fix This fixes a bug or incorrect documentation. label Oct 31, 2025
SkySkimmer added a commit to SkySkimmer/Coq-Equations that referenced this pull request Oct 31, 2025
SkySkimmer added a commit to SkySkimmer/coq-elpi that referenced this pull request Oct 31, 2025
@SkySkimmer SkySkimmer added the request: full CI Use this label when you want your next push to trigger a full CI. label Oct 31, 2025
SkySkimmer added a commit to SkySkimmer/coq-lsp that referenced this pull request Oct 31, 2025
Not sure how we should consider async exns in this code, for now I
made them considered as anomalies.
Typically the reductionops "Tried to normalize ill-typed term" hack
deliberately catches anomalies (including those from CErrors.anomaly)
but shouldn't catch async exceptions.

Hopefully fixes the non working Fail Timeout in bedrock2, we could
also revert rocq-prover#21216 for our own test suite.

(debugging tip: apply

~~~diff
@@ -32,7 +32,14 @@ let check_for_interrupt () =
 (* This function assumes it is the only function calling [setitimer] *)
 let unix_timeout n f x =
   let open Unix in
-  let timeout_handler _ = raise Timeout in
+  let timeout_handler _ =
+    let info = Exninfo.reify () in
+    let () =
+      Exninfo.get_backtrace info |> Option.iter @@ fun bt ->
+      Printf.eprintf "timeout at %s\n%!" (Exninfo.backtrace_to_string bt)
+    in
+    raise Timeout
+  in
   let old_timer = getitimer ITIMER_REAL in
   (* Here we assume that the existing timer will also interrupt us. *)
   if old_timer.it_value > 0. && old_timer.it_value <= n then Ok (f x) else
~~~

then run a buggy Timeout with `Set Debug "backtrace"` on until you get
the bug, the function which catches the timeout exn should be in the
printed backtrace. For instance I had

~~~
Called from Reductionops.infer_conv_gen in file "pretyping/reductionops.ml", line 1324, characters 10-52
~~~
when trying the Fail Timeout in bedrock2 LeakageSemantics.)
@coqbot-app coqbot-app bot removed the request: full CI Use this label when you want your next push to trigger a full CI. label Oct 31, 2025
| Anomaly _ -> true
| exn -> not (is_handled exn)
| Control.Timeout | Sys.Break | Out_of_memory | Stack_overflow -> async
| exn -> if Memprof_coq.is_interrupted() then async else not (is_handled exn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have a different function given that the async:true flag is the exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind: fix This fixes a bug or incorrect documentation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants