Training stops before epoch 0 after loading best.torch

Good day,

I am trying to train on my own dataset as with the case with issue #215. I opted to load the weights from the crowdAI dataset trained model and then continue training on my own images from there.

Using issue #160 as reference, I loaded the weights from best.torch.
(btw, is it correct to use `self.load('.../experiments/mapping_challenge_baseline/checkpoints/unet/best.torch')`?) 
I also set `self._initializar _model_weights = None'`.

However it threw out an error: ` ‘module’ object has no attribute ‘_rebuild_tensor_v2’`
Which I was able to fix via [this thread](https://discuss.pytorch.org/t/question-about-rebuild-tensor-v2/14560).

Another error occurred: 
And I fixed it via [this thread](https://discuss.pytorch.org/t/unexpected-key-in-state-dict-bn1-num-batches-tracked/29454/4).

Now, running `python main.py train --pipeline_name unet_weighted` does not throw any more errors, but training seems to not start at all (no prints of epoch 0).
Here is the full printout of the console:

```
/home/USER/Developer/anaconda3/envs/mapping/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)
/home/USER/Developer/ML/open-solution-mapping-challenge/src/utils.py:132: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
SHOW-325
https://ui.neptune.ml/shared/showroom/e/SHOW-325
2019-09-26 01-18-48 mapping-challenge >>> training
2019-09-26 01-18-54 steps >>> step xy_train adapting inputs
2019-09-26 01-18-54 steps >>> step xy_train transforming...
2019-09-26 01-18-54 steps >>> step xy_inference adapting inputs
2019-09-26 01-18-54 steps >>> step xy_inference transforming...
2019-09-26 01-18-54 steps >>> step loader adapting inputs
2019-09-26 01-18-54 steps >>> step loader transforming...
2019-09-26 01-18-54 steps >>> step unet unpacking inputs
2019-09-26 01-18-54 steps >>> step unet loading transformer...
2019-09-26 01-18-55 steps >>> step unet transforming...
2019-09-26 01-18-58 steps >>> step mask_resize adapting inputs
2019-09-26 01-18-58 steps >>> step mask_resize transforming...
100%|##########| 16/16 [00:01<00:00,  8.38it/s]
2019-09-26 01-18-59 steps >>> step mask_resize caching outputs...
2019-09-26 01-18-59 steps >>> step category_mapper adapting inputs
2019-09-26 01-18-59 steps >>> step category_mapper transforming...
100%|##########| 16/16 [00:00<00:00, 1761.53it/s]
2019-09-26 01-19-00 steps >>> step mask_erosion adapting inputs
2019-09-26 01-19-00 steps >>> step mask_erosion transforming...
100%|##########| 16/16 [00:00<00:00, 136956.87it/s]
2019-09-26 01-19-00 steps >>> step labeler adapting inputs
2019-09-26 01-19-00 steps >>> step labeler transforming...
100%|##########| 16/16 [00:00<00:00, 132.53it/s]
2019-09-26 01-19-00 steps >>> step mask_dilation adapting inputs
2019-09-26 01-19-00 steps >>> step mask_dilation transforming...
100%|##########| 16/16 [00:00<00:00, 92.15it/s]
2019-09-26 01-19-00 steps >>> step mask_resize loading output...
2019-09-26 01-19-00 steps >>> step score_builder adapting inputs
2019-09-26 01-19-00 steps >>> step score_builder transforming...
100%|##########| 16/16 [00:00<00:00, 18.44it/s]
2019-09-26 01-19-01 steps >>> step output adapting inputs
2019-09-26 01-19-01 steps >>> step output transforming...
(mapping) USER@debian:~/Developer/ML/open-solution-mapping-challenge$
```
No errors are reported but the training does not seem to start. Do you have any ideas for why this is the case? Thank you.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training stops before epoch 0 after loading best.torch #218

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training stops before epoch 0 after loading best.torch #218

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions