Skip to content

Improve error messages and logging capabilities #14

@dennisklein

Description

@dennisklein

Summary

Enhance error handling and logging to improve debugging and user experience.

Proposed Improvements

Better Error Messages

  • Improve plugstack.conf error message (main.cpp:341-353)

    • Add examples of correct configuration
    • Suggest fixes for common mistakes
    • Include link to documentation
  • Add context to error messages

    slurm_error("singularity-exec: Failed to set container name '%s'. "
                "The option was already set to '%s'. "
                "Check for duplicate options in job script or command line.",
                optarg, s_container_name.c_str());

Error Propagation from Wrapper Script

  • Capture singularity error messages and forward to Slurm logs
  • Add exit code mapping for common errors
    # In wrapper script
    if [ $? -ne 0 ]; then
      echo "Error: Singularity exec failed with exit code $?" >&2
      exit $?
    fi

Structured Logging Levels

  • Document current logging levels (error, debug, verbose)
  • Add more granular debug levels
  • Consider environment variable: SLURM_SINGULARITY_LOG_LEVEL
  • Log configuration summary at debug level

Validation Error Messages

  • When container file not found, suggest checking:

    • File path spelling
    • File permissions
    • Whether path is accessible from compute nodes
  • When script not found, provide:

    • Expected script location
    • How to configure custom script path
    • Link to installation documentation

Logging Enhancements

  • Add timestamps to debug output (optional)
  • Log plugin version on initialization
  • Log effective configuration (merged from defaults + CLI + env vars)
  • Add option to redirect debug output to separate log file

Example Improved Error Message

Before:

singularity-exec plugin: argument in plugstack.conf is invalid: 'foo'

After:

singularity-exec plugin: Invalid argument in plugstack.conf: 'foo'

Supported arguments:
  default=<path>          Path to default container (can be empty)
  script=<path>           Path to wrapper script (default: /usr/lib/slurm/slurm-singularity-wrapper.sh)
  bind=<spec>             Default bind mounts (e.g., bind=/data,/scratch)
  global=<options>        Global singularity options (e.g., global=--silent)
  args="<args>"           Singularity exec arguments (quotes required, e.g., args="--no-home")
  args=disabled           Disable --singularity-args option

Example configuration:
  required /usr/lib64/slurm/singularity-exec.so default=/opt/containers/default.sif script=/usr/libexec/slurm-singularity-wrapper.sh bind=/data global=--silent args=""

Documentation: https://github.com/GSI-HPC/slurm-singularity-exec#configuration

Benefits

  • Reduced time debugging configuration issues
  • Better user experience
  • Easier troubleshooting for admins

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions