-
Notifications
You must be signed in to change notification settings - Fork 374
AN-751 predefinedMachineType
runtime attribute
#7817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
We write the HTML docs to disk on the CI instance and then throw them away.
gcp_machine_type
runtime attributepredefinedMachineType
runtime attribute
override def validateValue: PartialFunction[WomValue, ErrorOr[MachineType]] = { | ||
case WomString(s) => MachineType(s).validNel | ||
case other => | ||
s"Invalid '$key': String value required but got ${other.womType.friendlyName}.".invalidNel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming, in the case that the user supplies an invalid machine type, we let Batch handle that error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right now they will get an error from Batch. It might be worth adding a test case to characterize the behavior, actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was about to suggest that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thoroughly documenting the new option, with the necessary warnings as the behavior can be confusing.
I have a clarifying question about what the batch UI displays, and a request to create a backlog ticket to handle cost estimation once this is not an alpha feature anymore :)
new Exception( | ||
s"Task $jobTag failed. $returnCodeMessage GCP Batch task exited with ${errorCode}(${errorCode.code}). ${message}" | ||
) | ||
) with NoStackTrace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the no stack trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we deliberately create exceptions in the program flow, my opinion is that they should never have a stack trace as it clutters the log and is not relevant for debugging.
A second order issue is that users often diligently copy-paste entire stack traces, rendering Slack threads and Zendesk cases unreadable.
After:
2025-10-16 14:38:12 cromwell-system-akka.dispatchers.engine-dispatcher-5 INFO -
WorkflowManagerActor: Workflow 974aa6ec-eccf-4267-8e83-65f230967dd6 failed (during ExecutingWorkflowState): cromwell.backend.google.batch.actors.GcpBatchAsyncBackendJobExecutionActor$$anon$1: Task minimal_hello_world.hello_world:NA:1 failed.
The job was stopped before the command finished. GCP Batch task exited with Success(0).
Before:
2025-10-16 16:58:09 cromwell-system-akka.dispatchers.engine-dispatcher-111 INFO -
WorkflowManagerActor: Workflow 70e6cac9-e991-48a6-92e9-da333c209e1e failed (during ExecutingWorkflowState): java.lang.Exception: Task minimal_hello_world.hello_world:NA:1 failed.
The job was stopped before the command finished. GCP Batch task exited with Success(0).
at cromwell.backend.google.batch.actors.GcpBatchAsyncBackendJobExecutionActor$.StandardException(GcpBatchAsyncBackendJobExecutionActor.scala:83)
at cromwell.backend.google.batch.actors.GcpBatchAsyncBackendJobExecutionActor.handleFailedRunStatus$1(GcpBatchAsyncBackendJobExecutionActor.scala:1152)
at cromwell.backend.google.batch.actors.GcpBatchAsyncBackendJobExecutionActor.$anonfun$handleExecutionFailure$1(GcpBatchAsyncBackendJobExecutionActor.scala:1168)
at scala.util.Try$.apply(Try.scala:210)
at cromwell.backend.google.batch.actors.GcpBatchAsyncBackendJobExecutionActor.handleExecutionFailure(GcpBatchAsyncBackendJobExecutionActor.scala:1160)
at cromwell.backend.google.batch.actors.GcpBatchAsyncBackendJobExecutionActor.handleExecutionFailure(GcpBatchAsyncBackendJobExecutionActor.scala:144)
at cromwell.backend.standard.StandardAsyncExecutionActor$$anonfun$handleExecutionResult$11.applyOrElse(StandardAsyncExecutionActor.scala:1506)
at cromwell.backend.standard.StandardAsyncExecutionActor$$anonfun$handleExecutionResult$11.applyOrElse(StandardAsyncExecutionActor.scala:1503)
at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:490)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
) | ||
|
||
/** | ||
* The "compute resource" concept is a suggestion to Batch regarding how many jobs can fit on a single VM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I 100% understand. Why do we supply the compute resource if it is not used and lead to UI confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because otherwise Google displays default values that are even more wrong.
If we make a one-line change to develop
to omit setComputeResource()
we still get the right machine shape, we just get Google's default values in the UI:

In the future we could enhance the code to calculate a CPU and memory size for each predefinedMachineShape
and set them in the UI as well. As far as I can tell this is a nice-to-have, maybe it will happen as part of the cost enhancements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha, yeah definitely a follow up thing to do if you can add that to the cost ticket
Description
Create the
predefinedMachineType
attribute and document as an alpha feature with some limitations. Centaur tests include a CPU typen2
and a GPUg2
.To keep this PR bounded, I elected to defer any cost considerations to a future story.
We had previously discussed creating an allowlist of types
n1-standard-*
,n2-standard-*
etc. we can calculate cost for. It didn't make sense to me after finding our cost logic only works with*-custom-*
and not*-standard-*
.Closes #7535
Supersedes #7545
Release Notes Confirmation
CHANGELOG.md
CHANGELOG.md
in this PRCHANGELOG.md
because it doesn't impact community usersTerra Release Notes