Skip to content

Circuit Tracing in Transformers: Peeking Inside the Black Box #345

Open
@virajsharma2000

Description

@virajsharma2000

Title

Circuit Tracing in Transformers: Peeking Inside the Black Box

Describe your Talk

I will explain what circuit tracing is in large language models like Gemma. It's a new way to understand how models answer questions by looking inside them and checking which neurons activate when. Anthropic open sourced a library called circuit-tracer and also the website Neuronpedia, which helps us find neurons linked to real-world concepts like "Texas" or "capital".

I'll show how this works with a live demo from their Jupyter notebook. We'll see what nodes and supernodes are, and how they connect to form reasoning paths in the model. This can help us debug, understand, and make models safer.

Pre-requisites & reading material

rread about transformers, and about activations and circuits

Resources

https://www.anthropic.com/research/open-source-circuit-tracing

Time required for the talk

25

Link to slides/demos

https://docs.google.com/presentation/d/1FNd37jW3nB95lko2imfk6A7VGVG0S0H53hUgWocYJ1g/edit?usp=sharing

About you

Viraj

Availability

21/06/2025

Any comments

Ill try to make it engaging, I will post a demo video of the talk

Metadata

Metadata

Labels

on holdThis proposal is on hold for organisational reasons, or as requested by the author, or other reasonsproposalWish to present at PyDelhi? This label added automatically on choosing the "Talk Proposal" option.review in progressThis proposal is currently under review

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions