-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
Previously, we had remote-exec in Terraform to SSH into the client EC2 Instance to measure the time for startup (time for Terraform apply + time to finish cloud-init). However, this caused some bugs indicated in this issue: #192. Therefore, in this PR, we fixed it by removing remote-exec.
Proposed Solution
The proposed new way is to measure time for cloud-init in the user_data.sh for the client VMs. Also, we can measure the time for start.sh for Client VMs' Docker containers. These values will be stored in the database for runtime calculation in the future via RESTful API.
1. Database Modification
We need to store the values in the database is spot on. I recommend adding new columns to the existing vms table, as these metrics are
properties of a specific VM.
- Table:
vms - New Columns:
terraform_apply_duration_seconds: FLOATcloud_init_duration_seconds: FLOATcontainer_startup_duration_seconds: FLOAT
- Action: Update
packages/allocator/src/lablink_allocator_service/generate_init_sql.pyto reflect these changes.
2. Measuring Terraform Apply Time (Allocator-side)
- Logic: In the function that runs terraform apply, record the start and end time.
- Action: Modify
packages/allocator/src/lablink_allocator_service/utils/terraform_utils.py. The function that wraps the terraform apply command should be updated to time the execution and store the result in the terraform_apply_duration_seconds column for the newly created VMs.
3. Measuring on the Client VM (Client-Side)
This is the core of your proposal. We can use simple shell commands to get the timings.
- In
user_data.sh:
#!/bin/bash
CLOUD_INIT_START_TIME=$(date +%s)
# ... all existing user_data commands ...
# (installing docker, pulling images, etc.)
CLOUD_INIT_END_TIME=$(date +%s)
CLOUD_INIT_DURATION=$((CLOUD_INIT_END_TIME - CLOUD_INIT_START_TIME))
# Use curl to send this metric back to the allocator
curl -X POST -H "Content-Type: application/json" \
-d "{\"cloud_init_duration_seconds\": $CLOUD_INIT_DURATION}" \
${allocator_url}/vm_metrics/${hostname}- In
start.sh(inside the client Docker container):
#!/bin/bash
CONTAINER_START_TIME=$(date +%s)
# ... existing container startup logic ...
# (activating venv, starting services)
CONTAINER_END_TIME=$(date +%s)
CONTAINER_DURATION=$((CONTAINER_END_TIME - CONTAINER_START_TIME))
# Use curl to send this metric back to the allocator
curl -X POST -H "Content-Type: application/json" \
-d "{\"container_startup_duration_seconds\": $CONTAINER_DURATION}" \
$ALLOCATOR_URL/vm_metrics/$HOSTNAME(Note: Assumes
ALLOCATOR_URLandHOSTNAMEare available as environment variables inside the container, which they should be).
4. Add Endpoints to the Allocator Web Server
- New Endpoint:
POST /vm_metrics/<hostname> - Logic: This endpoint will receive a JSON payload with one or more metric fields. It will find the corresponding VM in the vms table by
hostname and update the appropriate duration column. - Action: Add this new route and handler to
packages/allocator/src/lablink_allocator_service/main.py.