-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Just wanted to suggest some potential improvements based on my experience with the current template:
-
Microseconds can be added to the directory names to avoid collisions (I once had an unfortunate situation when two jobs started at the same time on the cluster and there was a single folder for two different models).
I think it should be enough to add${now:%H-%M-%S-%f}inconfigs/hydra/default.yaml:
pytorch-ie-hydra-template-1/configs/hydra/default.yaml
Lines 19 to 22 in 3b37839
run: dir: ${paths.log_dir}/${pipeline_type}/runs/${name}/${now:%Y-%m-%d}_${now:%H-%M-%S} sweep: dir: ${paths.log_dir}/${pipeline_type}/multiruns/${name}/${now:%Y-%m-%d}_${now:%H-%M-%S}
and inconfigs/train.yaml:
pytorch-ie-hydra-template-1/configs/train.yaml
Lines 73 to 74 in 3b37839
# where to save the trained model and taskmodule model_save_dir: ${paths.save_dir}/models/${name}/${now:%Y-%m-%d_%H-%M-%S} -
The column names in the output md files can be sorted, this would make it easier to compare the results from different runs and experiments in the log file. I typically copy the results from
job_return_value.mdorjob_return_value.aggregated.mdand the columns in these files often appear in a different order.
I think sorting columns could be done insrc/hydra_callbacks/save_job_return_value.pyby adding something like this:if isinstance(result, pd.DataFrame): result = result.reindex(sorted(result.columns), axis=1) elif isinstance(result, pd.Series): result = result.sort_index()before the result is written into file:
pytorch-ie-hydra-template-1/src/hydra_callbacks/save_job_return_value.py
Lines 257 to 258 in 3b37839
with open(str(output_dir / filename), "w") as file: file.write(result.to_markdown())
EDIT: This was implemented in add parametersort_markdown_columnstoSaveJobReturnValueCallback#182.