You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The hive core code for executing simulations and clients kind of works, but it has a huge flaw: The running state of client containers launched by hive is not tracked very well. Things generally work if the simulation behaves as expected (i.e. it has to invoke the HTTP API endpoints at the right time, in the right order). Things also work if hive is not interrupted while running. But when hive receives an interrupt, it just exits and leaves containers running.
It would be nice to fix this, and here's how: At the time of writing, the simulation API uses common.TestManager for tracking the running suite and tests. When the simulation launches clients, their container ID is provided to the test manager. But the test manager doesn't have any handle on the simulation container, and doesn't check the lifecycle of client containers either. This is because the test manager doesn't have an event loop.
I propose that we rewrite the test manager such that it handles a single simulation run using a 'for / select' style loop, tracking lifecycle events and acting on them appropriately:
When the simulation container exits abnormally, all running test suites and their tests should end in failed state, and running clients containers should be shut down.
When hive is interrupted externally, the simulation should end as well and all client containers should be removed.
When the simulation run times out or an unexpected interrupt occurs, it should be visible in the suite report somehow.
The text was updated successfully, but these errors were encountered:
The hive core code for executing simulations and clients kind of works, but it has a huge flaw: The running state of client containers launched by hive is not tracked very well. Things generally work if the simulation behaves as expected (i.e. it has to invoke the HTTP API endpoints at the right time, in the right order). Things also work if hive is not interrupted while running. But when hive receives an interrupt, it just exits and leaves containers running.
It would be nice to fix this, and here's how: At the time of writing, the simulation API uses
common.TestManager
for tracking the running suite and tests. When the simulation launches clients, their container ID is provided to the test manager. But the test manager doesn't have any handle on the simulation container, and doesn't check the lifecycle of client containers either. This is because the test manager doesn't have an event loop.I propose that we rewrite the test manager such that it handles a single simulation run using a 'for / select' style loop, tracking lifecycle events and acting on them appropriately:
The text was updated successfully, but these errors were encountered: