Skip to content

Multi-experiment jobs #485

@pearce8

Description

@pearce8

Let's start with single node experiments only for now. For CI, we need to launch multiple experiments as a single scheduler job.

  • Generate the scheduler command for a single node (without the launcher command)
  • Generate a launcher command for an individual experiment (without the scheduler command)
  • Use ramble -D . on --where (filtering) to get specific experiments into a single job (e.g., single node)

@scheibelp would it be possible to refactor the allocation modifier to generate the scheduler and the launcher command separately?

multi_job_submit.sh

#/bin/bash
#SBATCH -N $1

WORKSPACES="workspace1
workspace2
workspace3"

for WRKSPC in $WORKSPACES;
do
  ramble -D $WRKSPC on --where '{n_nodes} == $SLURM_JOB_NUM_NODES' --executor='{execute_experiment}'
done

Usage:

sbatch multi_job_submit.sh 2
sbatch multi_job_submit.sh 4

Current template:

#!/bin/bash
# Copyright 2023 Lawrence Livermore National Security, LLC and other
# Benchpark Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: Apache-2.0

{allocation_directives}

cd {experiment_run_dir}

{pre_exec}
{command}
{post_exec}

No directive template:

#!/bin/bash
# Copyright 2023 Lawrence Livermore National Security, LLC and other
# Benchpark Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: Apache-2.0

cd {experiment_run_dir}

{pre_exec}
{command}
{post_exec}

directive only template:

#!/bin/bash
# Copyright 2023 Lawrence Livermore National Security, LLC and other
# Benchpark Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: Apache-2.0

{allocation_directives}

Ramble exec template:

#/bin/bash
#SBATCH -N {n_nodes}

ramble -D . on --where '\{n_nodes\} == $SLURM_JOB_NUM_NODES' --executor='\{execute_experiment\}'
ramble:
  applications:
    hostname:
      workloads:
        parallel:
          experiments:
            wrapper_job_{n_nodes}:
              variables:
                n_nodes: [1, 2, 4, 8]
  ... other experiments ...
    saxpy: (1, 2, 4, 8, ... nodes)
  
ramble workpsace setup
ramble on --where '"{application_name}" == "hostname"' --executor="sbatch {ramble_exec}"

Metadata

Metadata

Assignees

Labels

featureNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions