Open
Description
Since beginning of this week, the cml runner launch command is failing to provision a VM.
The VM provisioning works, but the service is unable to start the VM gets terminated.
jobs:
deploy-runner:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- uses: iterative/setup-cml@v2
- name: deploy runner
env:
GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.CML_AI_GCLOUD_SERVICE_ACCOUNT_KEY }}
run: |
cml runner launch --cloud=gcp --cloud-type=g2-standard-16+nvidia-l4*1 --name=test-name --labels="val,bk0000,L4" --token=${{ secrets.CML_GITHUB_TOKEN }} --repo=https://github.com/***/**.git --cloud-hdd-size=200 --idle-timeout=1200 --log=debug --cloud-region=us-central1-a
Some logs from the serial console
cml.sh[39067]: {"level":"error","message":"terraform,version\n\t\n\t","stack":"Error: terraform,version\n\t\n\t\n at /snapshot/cml/src/utils.js:38:18\n at exithandler (node:child_process:406:5)\n at ChildProcess.errorhandler (node:child_process:418:5)\n at ChildProcess.emit (node:events:527:28)\n at Process.ChildProcess._handle.onexit (node:internal/child_process:289:12)\n at onErrorNT (node:internal/child_process:478:16)\n at processTicksAndRejections (node:internal/process/task_queues:83:21)"}
Jun 12 08:50:32 cml-test-name-mqmvh9j6-5383uaw2 systemd[1]: cml.service: Main process exited, code=exited, status=1/FAILURE
Jun 12 08:50:32 cml-test-name-mqmvh9j6-5383uaw2 systemd[1]: cml.service: Failed with result 'exit-code'.
Would there be a fix for this issue? Or shoudl we use our own startup scripts?
Any suggestions are much appreciated
Metadata
Metadata
Assignees
Labels
No labels