Pytorch's RNN layer obscures important details of how an RNN works, while Karpathy's classic implementation written in pure Numpy requires some math and backprop to understand.
If you would like to understand Karpathy's code, Eli Bendesrky provides an excellent explanation of the details of the math used in Karpathy's code. He also provides an updated, more well-commented version of Karpathy's original code here.
My implementation here finds a middle ground by depending on Pytorch's autograd capabilities to handle the backprop while retaining the low-level details of how an RNN works. Most of the code is modified from here.
The RNN is a minimal character-level language model that trains on any given text, in this case some Shakespeare.
Loss for Karpathy's implementation:
Loss for my implementation: