Description
Title
Circuit Tracing in Transformers: Peeking Inside the Black Box
Describe your Talk
I will explain what circuit tracing is in large language models like Gemma. It's a new way to understand how models answer questions by looking inside them and checking which neurons activate when. Anthropic open sourced a library called circuit-tracer and also the website Neuronpedia, which helps us find neurons linked to real-world concepts like "Texas" or "capital".
I'll show how this works with a live demo from their Jupyter notebook. We'll see what nodes and supernodes are, and how they connect to form reasoning paths in the model. This can help us debug, understand, and make models safer.
Pre-requisites & reading material
rread about transformers, and about activations and circuits
Resources
https://www.anthropic.com/research/open-source-circuit-tracing
Time required for the talk
25
Link to slides/demos
https://docs.google.com/presentation/d/1FNd37jW3nB95lko2imfk6A7VGVG0S0H53hUgWocYJ1g/edit?usp=sharing
About you
Viraj
Availability
21/06/2025
Any comments
Ill try to make it engaging, I will post a demo video of the talk