You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Very intuitive and simple, @karpathy! I was coding in parallel while watching your video and took a slightly different approach.
For computing gradients of + and *, I used the fundamental derivative formula:
$$ L = \lim_{h \to 0} \frac{f(a+h) - f(a)}{h} $$
Instead of using topological sorting for backpropagation, I implemented a recursive approach, where each parent node checks its child nodes and calculates gradients accordingly. While this method is probably less efficient—as it can recompute gradients for child nodes multiple times when gradients flow from multiple paths—it still serves as a valid alternative that produces the same results.