Skip to content

the traing log like this is Normal? I do not find loss in the logs, and what does the "grad norm: nan" mean? #396

Open
@alphanlp

Description

@alphanlp

d norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.886 | TFLOPs: 78.46 |
iteration 5426/ 250000 | consumed samples: 43408 | consumed tokens: 88899584 | elapsed time per iteration (ms): 4247.9 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.883 | TFLOPs: 78.36 |
iteration 5427/ 250000 | consumed samples: 43416 | consumed tokens: 88915968 | elapsed time per iteration (ms): 4225.8 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.893 | TFLOPs: 78.77 |
iteration 5428/ 250000 | consumed samples: 43424 | consumed tokens: 88932352 | elapsed time per iteration (ms): 4229.2 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.892 | TFLOPs: 78.71 |
iteration 5429/ 250000 | consumed samples: 43432 | consumed tokens: 88948736 | elapsed time per iteration (ms): 4233.6 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.890 | TFLOPs: 78.63 |
iteration 5430/ 250000 | consumed samples: 43440 | consumed tokens: 88965120 | elapsed time per iteration (ms): 4247.0 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.884 | TFLOPs: 78.38 |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions