-
Notifications
You must be signed in to change notification settings - Fork 744
Open
Description
Environment
Software Environment:
- MindSpore version (source or binary):2.2.14
- Ubuntu20.04, Python 3.8, CUDA 11.6, cuDNN 8
Describe the current behavior
These two models have the same architecture, identical input, and the same initialization. However, there is a significant difference in their gradients after backpropagation. When we experiment with each API individually, this phenomenon does not occur. Currently, this issue is only observed with the LSTM API.
Describe the expected behavior
They should be same.
Steps to reproduce the issue
class Model_cDf5CgkzsFaikw3L4C7KgcmSun0p4zhi(mindspore.nn.Cell):
def __init__(self):
super(Model_cDf5CgkzsFaikw3L4C7KgcmSun0p4zhi, self).__init__()
self.rnn1 = mindspore.nn.RNN(input_size=2048, hidden_size=2031, batch_first=True)
self.rnn2 = mindspore.nn.LSTM(input_size=2031, hidden_size=1779, batch_first=True)
self.rnn3 = mindspore.nn.LSTM(input_size=1779, hidden_size=1236, batch_first=True)
self.linear = mindspore.nn.Dense(in_channels=1236, out_channels=10)
def construct(self, x):
x, _ = self.rnn1(x)
x, _ = self.rnn2(x)
_, (x, _) = self.rnn3(x)
x = self.linear(x[-1])
x = x
return x
class Model_1748971508(nn.Layer):
def __init__(self):
super(Model_1748971508, self).__init__()
self.rnn1 = paddle.nn.SimpleRNN(input_size=2048, hidden_size=2031, time_major=False)
self.rnn2 = paddle.nn.LSTM(input_size=2031, hidden_size=1779, time_major=False)
self.rnn3 = paddle.nn.LSTM(input_size=1779, hidden_size=1236, time_major=False)
self.linear = paddle.nn.Linear(in_features=1236, out_features=10)
def forward(self, x):
x, _ = self.rnn1(x)
x, _ = self.rnn2(x)
_, (x, _) = self.rnn3(x)
x = self.linear(x[-1])
x = x
return x
Metadata
Metadata
Assignees
Labels
No labels