Skip to content

Time Measurement Methods for Client VM Startup #224

@7174Andy

Description

@7174Andy

Problem

Previously, we had remote-exec in Terraform to SSH into the client EC2 Instance to measure the time for startup (time for Terraform apply + time to finish cloud-init). However, this caused some bugs indicated in this issue: #192. Therefore, in this PR, we fixed it by removing remote-exec.

Proposed Solution

The proposed new way is to measure time for cloud-init in the user_data.sh for the client VMs. Also, we can measure the time for start.sh for Client VMs' Docker containers. These values will be stored in the database for runtime calculation in the future via RESTful API.

1. Database Modification

We need to store the values in the database is spot on. I recommend adding new columns to the existing vms table, as these metrics are
properties of a specific VM.

  • Table: vms
  • New Columns:
    • terraform_apply_duration_seconds: FLOAT
    • cloud_init_duration_seconds: FLOAT
    • container_startup_duration_seconds: FLOAT
  • Action: Update packages/allocator/src/lablink_allocator_service/generate_init_sql.py to reflect these changes.

2. Measuring Terraform Apply Time (Allocator-side)

  • Logic: In the function that runs terraform apply, record the start and end time.
  • Action: Modify packages/allocator/src/lablink_allocator_service/utils/terraform_utils.py. The function that wraps the terraform apply command should be updated to time the execution and store the result in the terraform_apply_duration_seconds column for the newly created VMs.

3. Measuring on the Client VM (Client-Side)

This is the core of your proposal. We can use simple shell commands to get the timings.

  • In user_data.sh:
#!/bin/bash
CLOUD_INIT_START_TIME=$(date +%s)

# ... all existing user_data commands ...
# (installing docker, pulling images, etc.)

CLOUD_INIT_END_TIME=$(date +%s)
CLOUD_INIT_DURATION=$((CLOUD_INIT_END_TIME - CLOUD_INIT_START_TIME))

# Use curl to send this metric back to the allocator
curl -X POST -H "Content-Type: application/json" \
    -d "{\"cloud_init_duration_seconds\": $CLOUD_INIT_DURATION}" \
    ${allocator_url}/vm_metrics/${hostname}
  • In start.sh (inside the client Docker container):
#!/bin/bash
CONTAINER_START_TIME=$(date +%s)

# ... existing container startup logic ...
# (activating venv, starting services)

CONTAINER_END_TIME=$(date +%s)
CONTAINER_DURATION=$((CONTAINER_END_TIME - CONTAINER_START_TIME))

# Use curl to send this metric back to the allocator
curl -X POST -H "Content-Type: application/json" \
   -d "{\"container_startup_duration_seconds\": $CONTAINER_DURATION}" \
    $ALLOCATOR_URL/vm_metrics/$HOSTNAME

(Note: Assumes ALLOCATOR_URL and HOSTNAME are available as environment variables inside the container, which they should be).

4. Add Endpoints to the Allocator Web Server

  • New Endpoint: POST /vm_metrics/<hostname>
  • Logic: This endpoint will receive a JSON payload with one or more metric fields. It will find the corresponding VM in the vms table by
    hostname and update the appropriate duration column.
  • Action: Add this new route and handler to packages/allocator/src/lablink_allocator_service/main.py.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions