Softmax in Model Output, then using CE Loss

Thank you for the interesting work here.

I've just encountered one issue with the code. The ConvLSTM model outputs softmax as the last layer, but then in the training script CrossEntropyLoss is performed. CE Loss already performs a softmax on the input, so you do not want to do softmax on a softmax twice. Instead, the ConvLSTM should output the classification (Linear) layer prior to the Softmax to put into CE loss. The softmax probabilities can be computed later in the test set evaluation step to determine the test accuracy.

Please let me know if others agree with this small change to the code.

Also, what type of Attention is being used? Is it the dot-product? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Softmax in Model Output, then using CE Loss #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Softmax in Model Output, then using CE Loss #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions