Skip to content

TimeoutStartUSec too low, or MicroVMs taking a little while be to start #317

@pwaller

Description

@pwaller

I have a machine with (less than 10) microvms on it. When the machine is rebooted daily, the log looks like:

Dec 31 09:33:22 foo systemd[1]: Starting MicroVM 'machine-d'...
Dec 31 09:33:22 foo systemd[1]: Starting MicroVM 'machine-b'...
Dec 31 09:33:22 foo systemd[1]: Starting MicroVM 'machine-e'...
Dec 31 09:33:22 foo systemd[1]: Starting MicroVM 'machine-a'...
Dec 31 09:33:22 foo systemd[1]: Starting MicroVM 'machine-f'...
Dec 31 09:33:22 foo systemd[1]: Starting MicroVM machine-c'...
Dec 31 09:34:05 foo systemd[1]: Started MicroVM 'machine-a'.
Dec 31 09:34:52 foo systemd[1]: Started MicroVM machine-c'.
Dec 31 09:34:53 foo systemd[1]: [email protected]: Failed with result 'timeout'.
Dec 31 09:34:53 foo systemd[1]: Failed to start MicroVM 'machine-b'.
Dec 31 09:34:53 foo systemd[1]: [email protected]: Failed with result 'timeout'.
Dec 31 09:34:53 foo systemd[1]: Failed to start MicroVM 'machine-d'.
Dec 31 09:34:53 foo systemd[1]: [email protected]: Failed with result 'timeout'.
Dec 31 09:34:53 foo systemd[1]: Failed to start MicroVM 'machine-f'.
Dec 31 09:34:53 foo systemd[1]: [email protected]: Failed with result 'timeout'.
Dec 31 09:34:53 foo systemd[1]: Failed to start MicroVM 'machine-e'.
Dec 31 09:35:00 foo systemd[1]: Starting MicroVM 'machine-b'...
Dec 31 09:35:00 foo systemd[1]: Starting MicroVM 'machine-f'...
Dec 31 09:35:00 foo systemd[1]: Starting MicroVM 'machine-d'...
Dec 31 09:35:00 foo systemd[1]: Starting MicroVM 'machine-e'...
Dec 31 09:35:44 foo systemd[1]: Started MicroVM 'machine-e'.
Dec 31 09:35:44 foo systemd[1]: Started MicroVM 'machine-f'.
Dec 31 09:35:50 foo systemd[1]: Started MicroVM 'machine-b'.
Dec 31 09:34:53 foo systemd[1]: [email protected]: Failed with result 'timeout'.
Dec 31 09:36:31 foo systemd[1]: Failed to start MicroVM 'machine-d'.
Dec 31 09:36:39 foo systemd[1]: Starting MicroVM 'machine-d'...
Dec 31 09:37:48 foo systemd[1]: Started MicroVM 'machine-d'.

My first read of what appears to be happening is that the machines take a fair amount of CPU time before they are considered 'up' to systemd, and if they don't become up within the default time limit of 1m30s, they are terminated and restarted. Eventually this process settles down.

What doesn't quite add up for me is that I think there should be enough CPU available for this to all happen.

Some questions:

  • When is a microvm considered started?
  • How might I determine why they are not coming up quickly?
  • Has anyone else encountered this, are there common pitfalls?
  • Should the default start timeout be raised?
  • If it turns out the machines are contending for resources (such as I/O) is there a straightforward way to stagger their boot to reduce the contention overhead?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions