Skip to content

submit slurm script does not allow to continue an experiment #370

Open
@tjhunter

Description

@tjhunter

Is your feature request related to a problem? Please describe.

The private submit.py script currently expects the passed slurm script to contain all the logic: train, evaluate, etc.

What is better?

  • a single slurm script with many options
  • one slurm script for each of the train, finetune, continue, evaluate/sample

It seems to me each of these stages has different hardware requirements, so I would be enclined to have one script for each stage, but I am not an HPC expert here.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestinfraIssues related to infrastructure

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions