Skip to content

Improving the tracking of client and simulation containers #350

Open
@fjl

Description

@fjl

The hive core code for executing simulations and clients kind of works, but it has a huge flaw: The running state of client containers launched by hive is not tracked very well. Things generally work if the simulation behaves as expected (i.e. it has to invoke the HTTP API endpoints at the right time, in the right order). Things also work if hive is not interrupted while running. But when hive receives an interrupt, it just exits and leaves containers running.

It would be nice to fix this, and here's how: At the time of writing, the simulation API uses common.TestManager for tracking the running suite and tests. When the simulation launches clients, their container ID is provided to the test manager. But the test manager doesn't have any handle on the simulation container, and doesn't check the lifecycle of client containers either. This is because the test manager doesn't have an event loop.

I propose that we rewrite the test manager such that it handles a single simulation run using a 'for / select' style loop, tracking lifecycle events and acting on them appropriately:

  • When the simulation container exits abnormally, all running test suites and their tests should end in failed state, and running clients containers should be shut down.
  • When hive is interrupted externally, the simulation should end as well and all client containers should be removed.
  • When the simulation run times out or an unexpected interrupt occurs, it should be visible in the suite report somehow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions