fix: only trigger on_save if the output checkpoint path actually contains files#652
Conversation
|
Thanks for making a pull request! 😃 |
…ains files Signed-off-by: Harikrishnan Balagopal <[email protected]>
06f28ea to
3f2643c
Compare
|
@HarikrishnanBalagopal I don't think this is the right fix. If save_ model is called, and files are not there, its a problem. Trainer should be fixed rather. @dushyantbehl @YashasviChaurasia Can you look at the situations where this occurs? |
@ashokponkumar The original issue has already been fixed in the This is just an additional/sanity check on top since |
@dushyantbehl Shouldn't the tuning fail if save_model was called, and nothing ended up in the dir? In which usecase is it expected to have no files in the dir after save_model is called? |
@ashokponkumar I can send you the builds that were running into that exact scenario. Most likely a bug but this check allows us to catch those sort of bugs early without having to spend hours debugging unrelated things. |
|
If save_model fails, shouldn't the tuning fail? @dushyantbehl @HarikrishnanBalagopal ? @HarikrishnanBalagopal Can you please bring this topic up in our next scrum? |
Description of the change
Trigger
on_saveevent only if anything was actually saved in the output checkpoint directory.This simplifies the use of trainer_controller for triggering subsequent steps (like artifact upload)
Related issue number
Internal issue in our pipeline
How to verify the PR
Was the PR tested