Skip to content

More possible improvements #177

@tanikina

Description

@tanikina

Just wanted to suggest some potential improvements based on my experience with the current template:

  • Microseconds can be added to the directory names to avoid collisions (I once had an unfortunate situation when two jobs started at the same time on the cluster and there was a single folder for two different models).
    I think it should be enough to add ${now:%H-%M-%S-%f} in configs/hydra/default.yaml:

    run:
    dir: ${paths.log_dir}/${pipeline_type}/runs/${name}/${now:%Y-%m-%d}_${now:%H-%M-%S}
    sweep:
    dir: ${paths.log_dir}/${pipeline_type}/multiruns/${name}/${now:%Y-%m-%d}_${now:%H-%M-%S}

    and in configs/train.yaml:
    # where to save the trained model and taskmodule
    model_save_dir: ${paths.save_dir}/models/${name}/${now:%Y-%m-%d_%H-%M-%S}

  • The column names in the output md files can be sorted, this would make it easier to compare the results from different runs and experiments in the log file. I typically copy the results from job_return_value.md or job_return_value.aggregated.md and the columns in these files often appear in a different order.
    I think sorting columns could be done in src/hydra_callbacks/save_job_return_value.py by adding something like this:

       if isinstance(result, pd.DataFrame):
           result = result.reindex(sorted(result.columns), axis=1)
       elif isinstance(result, pd.Series):
           result = result.sort_index()
    

    before the result is written into file:

    with open(str(output_dir / filename), "w") as file:
    file.write(result.to_markdown())

    EDIT: This was implemented in add parameter sort_markdown_columns to SaveJobReturnValueCallback #182.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions