Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_parAffAlt does not cancel all its parallel fibers when killed #216

Open
deniskr opened this issue Sep 29, 2023 · 1 comment
Open

_parAffAlt does not cancel all its parallel fibers when killed #216

deniskr opened this issue Sep 29, 2023 · 1 comment

Comments

@deniskr
Copy link

deniskr commented Sep 29, 2023

We have a pretty complex we app that makes heavy use of fibers that run in parallel. Lately I stumbled upon the following scenario:

  1. fiber1 runs Control.Parallel.parOneOf (which is based on Effect.Aff._parAffAlt)
  2. The argument of Control.Parallel.parOneOf contains an array of 6 fibers that run in parallel
  3. Before any of the 6 child fibers complete, fiber1 is killed

Expected behavior
All 6 child fibers are killed

Actual behavior
5 of the 6 fibers are killed, one fiber continues running

Additional context
each of the child fiber contains quite a complex sub-tree of fibers which use bracket and habe non-tirivial finilizers.

To Reproduce
The case I describe happens in a complex integration test, which is impractical to run on your side. I tried to reduce the test to something consice, but when I reduce the complexity of the child fibers to something like delay (Milliseconds 10.0) the system works as expected. So the problem happens only when the child fibers use non-trivial finilizers and brackets.

Why I can be sure that the problem lies in _parAffAlt and not in our code? Because when I replaced the library Effect.Aff._parAffAlt function with my own implementation (see below) the system started working as expected.

Question
The unit tests in test/Test/Main.purs use very small fiber trees which mostly contain only the delay function. Those unit test
pass without triggering this bug.
Would it be possible to extend the test cases with large fiber trees built by Test.QuickCheck?
Does any body else encountered a similar hard to reproduce behaviour?

altPar
  :: forall a
  .  Aff a 
  -> Aff a 
  -> Aff a
altPar a1 a2 = supervise do
  f1Ref <- liftEffect $ Ref.new Nothing
  f2Ref <- liftEffect $ Ref.new Nothing

  f1 <- suspendAff do
    r1 <- attempt a1

    case r1 of
      Right _ -> do
        mf2 <- liftEffect $ Ref.read f2Ref
        for_ mf2 \f2 -> do
          killFiber (error "CancellationException1") f2 
        
      Left _ -> pure unit
    
    pure r1

  f2 <- suspendAff do
    r2 <- attempt a2

    case r2 of
      Right _ -> do
        mf1 <- liftEffect $ Ref.read f1Ref
        for_ mf1 \f1 -> do
          killFiber (error "CancellationException2") f1 
        
      Left _ -> pure unit
    
    pure r2
  
  liftEffect $ Ref.write (Just f1) f1Ref
  liftEffect $ Ref.write (Just f2) f2Ref

  fr1 <- forkAff (joinFiber f1)
  fr2 <- forkAff (joinFiber f2)

  r1 <- catchError 
          (joinFiber fr1)
          (\err -> pure (Left err))
  r2 <- catchError 
          (joinFiber fr2)
          (\err -> pure (Left err))


  case Tuple r1 r2 of
    Tuple (Right a) _ -> pure a
    Tuple (Left _) (Right a) -> pure a
    Tuple (Left ce) (Left err) | message ce == "CancellationException2" -> throwError err
    Tuple (Left err) (Left _) -> throwError err
@natefaubion
Copy link
Collaborator

I don't think I've ever experienced this and known it, at least. Can you give me an idea of what kind of work is being done?

  • How are you observing that it isn't canceled?
  • Is it all truly asynchronous, or some synchronous code, too?
  • Same for cancelers?
  • Is the initial cancelation in the same tick of the event loop?
  • Any idea what the size and shape of the tree is (like is it relatively balanced or is it full right or left associated)?
  • Is it only ParAff alt node, or also apply/map nodes?

Would it be possible to extend the test cases with large fiber trees built by Test.QuickCheck?

I don't know. If you have thoughts on how to do this, and are able to reproduce your issue with it, then I could see adding it. It would be nice if tests didn't take an unbounded amount of time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants