Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the tracking of client and simulation containers #350

Open
fjl opened this issue Oct 21, 2020 · 0 comments
Open

Improving the tracking of client and simulation containers #350

fjl opened this issue Oct 21, 2020 · 0 comments

Comments

@fjl
Copy link
Collaborator

fjl commented Oct 21, 2020

The hive core code for executing simulations and clients kind of works, but it has a huge flaw: The running state of client containers launched by hive is not tracked very well. Things generally work if the simulation behaves as expected (i.e. it has to invoke the HTTP API endpoints at the right time, in the right order). Things also work if hive is not interrupted while running. But when hive receives an interrupt, it just exits and leaves containers running.

It would be nice to fix this, and here's how: At the time of writing, the simulation API uses common.TestManager for tracking the running suite and tests. When the simulation launches clients, their container ID is provided to the test manager. But the test manager doesn't have any handle on the simulation container, and doesn't check the lifecycle of client containers either. This is because the test manager doesn't have an event loop.

I propose that we rewrite the test manager such that it handles a single simulation run using a 'for / select' style loop, tracking lifecycle events and acting on them appropriately:

  • When the simulation container exits abnormally, all running test suites and their tests should end in failed state, and running clients containers should be shut down.
  • When hive is interrupted externally, the simulation should end as well and all client containers should be removed.
  • When the simulation run times out or an unexpected interrupt occurs, it should be visible in the suite report somehow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant