We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perhaps this is a problem related to Google Cloud Storage, but since last week, it has been very common to get stuck in the saving checkpoint step:
I1025 08:58:19.649921 139866418210816 checkpoints.py:1191] Restoring dataset iterator from 'train_ds-027-of-032'. E1025 08:58:19.650164 139866418210816 dataset_iterator.py:130] DatasetIterator.load() is deprecated. Please use restore(). I1025 08:58:21.356897 139866418210816 train.py:393] Initialize/restore complete (4.48 seconds). I1025 08:58:21.358641 139866418210816 evaluation.py:370] Initializing Evaluator for 'ds' I1025 08:58:21.358845 139866418210816 evaluation.py:72] Task ds has no metrics defined; skipping eval. W1025 08:58:21.358932 139866418210816 evaluation.py:386] No eval task with valid split and metric fn found. Skipping eval. I1025 08:58:21.359025 139866418210816 train.py:568] Saving checkpoint before the training loop starts. *** It doesn't pass this step ***
It is an intermittent problem, i.e., sometimes it saves, sometimes, it doesn't.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Perhaps this is a problem related to Google Cloud Storage, but since last week, it has been very common to get stuck in the saving checkpoint step:
It is an intermittent problem, i.e., sometimes it saves, sometimes, it doesn't.
The text was updated successfully, but these errors were encountered: