Improving the tracking of client and simulation containers

The hive core code for executing simulations and clients kind of works, but it has a huge flaw: The running state of client containers launched by hive is not tracked very well. Things generally work if the simulation behaves as expected (i.e. it has to invoke the HTTP API endpoints at the right time, in the right order). Things also work if hive is not interrupted while running. But when hive receives an interrupt, it just exits and leaves containers running. 

It would be nice to fix this, and here's how: At the time of writing, the simulation API uses `common.TestManager` for tracking the running suite and tests. When the simulation launches clients, their container ID is provided to the test manager. But the test manager doesn't have any handle on the simulation container, and doesn't check the lifecycle of client containers either. This is because the test manager doesn't have an event loop. 

I propose that we rewrite the test manager such that it handles a single simulation run using a 'for / select' style loop, tracking lifecycle events and acting on them appropriately:

- When the simulation container exits abnormally, all running test suites and their tests should end in failed state, and running clients containers should be shut down.
- When hive is interrupted externally, the simulation should end as well and all client containers should be removed.
- When the simulation run times out or an unexpected interrupt occurs, it should be visible in the suite report somehow.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving the tracking of client and simulation containers #350

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improving the tracking of client and simulation containers #350

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions