GitHub - msyvr/llm-inspect: Inspect layer-level components of open weight LLMs during inference. E.g., extract activations at a target layer in response to prompts expected to elicit a particular state.

LLM inspect: interrogate open model layer activation patterns

Overview

Extract neuron activations to identify which neurons encode some state.

Quick Start

uv pip install torch transformers numpy matplotlib tqdm

uv run src/find_activations.py

TL;DR

Load model
Register forward hook on a model layer
Extract activations for select prompts (eg: 8 uncertain + 8 certain)
Identify selective neurons using statistical analysis

Output

The following plots are generated for visual inspection of the results:

State-specific activations across all neurons
Top selective neurons
Effect size distribution (Cohen's d)
Highest activation distribution
Activation pattern heatmap
Correlation map (top selective neurons)

Activation Extraction

class ActivationExtractor:
    def hook_fn(self, module, input, output):
        # Capture hidden states during forward pass
        hidden_states = output[0]  # (batch, seq, hidden_dim)
        self.activations = hidden_states.mean(dim=1)  # Average over sequence

This hooks into the transformer layer and collects the layer-specific activations during inference.

Example

Model: GPT-2 Small

12 transformer layers
768 hidden dimensions (neurons per layer)
124M total parameters

Activation Source:

Hook registered on model.transformer.h[layer_idx]
Extracts hidden states: (batch_size, sequence_length, 768)
Averages over sequence: (batch_size, 768)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
results		results
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM inspect: interrogate open model layer activation patterns

Overview

Quick Start

TL;DR

Output

Activation Extraction

Example

About

Uh oh!

Releases

Packages

Languages

msyvr/llm-inspect

Folders and files

Latest commit

History

Repository files navigation

LLM inspect: interrogate open model layer activation patterns

Overview

Quick Start

TL;DR

Output

Activation Extraction

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages