Releases: TransformerLensOrg/TransformerLens
v3.0.0b2
What's Changed
- Release 2.16 by @bryce13950 in #945
- Release 2.16.1 by @bryce13950 in #952
- Update README.md by @jmole in #957
- improve model properties table in docs by @mivanit in #769
- Release v2.16.2 by @bryce13950 in #958
- Add Gemma 3 and MedGemma model support by @huseyincavusbi in #1149
- Add timestamp for 2.0 announcement [docs] by @MattAlp in #983
- Add support for Qwen/Qwen3-0.6B-Base model by @mtaran in #1075
- Repairing tests that were failing due to recent contributions by @jlarson4 in #1157
- Fix 934 by @kapedalex in #1155
- Fix 1130 and 1102 by @kapedalex in #1154
- Fix key and value heads patching for models with different n_heads from n_key_value_heads by @nikolaystanishev in #981
- updating the compatibility notebook by @jlarson4 in #1158
- New Release – v2.17.0 by @jlarson4 in #1159
- Integrate v2.17.0 phase1 by @jlarson4 in #1166
- transformers v5 support by @jlarson4 in #1167
- Improve TransformerBridge optimizer compatibility via dual PyTorch/TransformerLens parameter access API by @speediedan in #1143
- Add HuggingFace ModelOutput support to TransformerLens generation API by @speediedan in #1144
- Testing R1 Distills to confirm functional in TransformerLens by @jlarson4 in #1168
- StableLM Architecture Adapter by @jlarson4 in #1171
- Complete type checking for OLMo support (builds on #816) by @taziksh in #1081
- Olmo3 support by @etomoscow in #1170
- Setup and tested OLMo architecture adapters by @jlarson4 in #1174
- Isolate demo dependencies and pin orjson for CVE-2025-67221 mitigation by @evcyen in #1173
- feat: Add LIT integration for interactive model analysis (#121) by @HetanshWaghela in #1163
- OpenELM Architecture Adapter by @jlarson4 in #1172
- fix: set n_ctx=512 for TinyStories models by @puranikyashaswin in #1162
- Architecture Benchmarks – Review & Extension by @jlarson4 in #1176
- created initial model registry tool by @bryce13950 in #1151
- Initial Verification Run by @jlarson4 in #1181
- Additional Verification by @jlarson4 in #1184
- Prepping for v3.0.0b2 by @jlarson4 in #1185
New Contributors
- @jmole made their first contribution in #957
- @huseyincavusbi made their first contribution in #1149
- @MattAlp made their first contribution in #983
- @mtaran made their first contribution in #1075
- @kapedalex made their first contribution in #1155
- @nikolaystanishev made their first contribution in #981
- @taziksh made their first contribution in #1081
- @etomoscow made their first contribution in #1170
- @HetanshWaghela made their first contribution in #1163
- @puranikyashaswin made their first contribution in #1162
Full Changelog: v3.0.0b1...v3.0.0b2
v2.17.0
We've got an exciting new release that includes several new models! Gemma 3, MedGemma, and Qwen3-0.6B-Base are now included in options for models. In addition to these new models, a handful of bugs and other small non-breaking changes were made.
What's Changed
- Update README.md by @jmole in #957
- improve model properties table in docs by @mivanit in #769
- Release v2.16.2 by @bryce13950 in #958
- Add Gemma 3 and MedGemma model support by @huseyincavusbi in #1149
- Add timestamp for 2.0 announcement [docs] by @MattAlp in #983
- Add support for Qwen/Qwen3-0.6B-Base model by @mtaran in #1075
- Repairing tests that were failing due to recent contributions by @jlarson4 in #1157
- Fix 934 by @kapedalex in #1155
- Fix 1130 and 1102 by @kapedalex in #1154
- Fix key and value heads patching for models with different n_heads from n_key_value_heads by @nikolaystanishev in #981
- updating the compatibility notebook by @jlarson4 in #1158
- New Release – v2.17.0 by @jlarson4 in #1159
New Contributors
- @jmole made their first contribution in #957
- @huseyincavusbi made their first contribution in #1149
- @MattAlp made their first contribution in #983
- @mtaran made their first contribution in #1075
- @kapedalex made their first contribution in #1155
- @nikolaystanishev made their first contribution in #981
Full Changelog: v2.16.1...v2.17.0
v3.0.0b1
What's Changed
- registered hook correctly by @bryce13950 in #1051
- optimized QKV bridge a bit by @bryce13950 in #1046
- Add support for layer norm and bias folding by @degenfabian in #1044
- updated get params to fill zeroes when needed by @bryce13950 in #1049
- Match device selection of TransformerBridge to HookedTransformer by @degenfabian in #1047
- Improve TransformerBridge hook compatibility with HookedTransformers by @degenfabian in #1054
- Enable setting cached hooks by @degenfabian in #1048
- Create bridge for every module in Phi 1 by @degenfabian in #1055
- Rename Neo bridges to be in line with new naming scheme by @degenfabian in #1056
- Rename Mixtral bridges to be in line with new naming scheme by @degenfabian in #1057
- added test and made sure backwards hooks are working by @bryce13950 in #1058
- Remove second layer norm from phi component mapping by @degenfabian in #1059
- Create bridge for every module in pythia by @degenfabian in #1060
- Create bridge for every module in Qwen 2 by @degenfabian in #1061
- Processing functions by @bryce13950 in #1053
- Attempted Processing match by @bryce13950 in #1063
- Process restoration by @bryce13950 in #1064
- Add missing configuration parameters by @degenfabian in #1065
- Properly set up normalization_type and layer_norm_folding attributes in initialized components by @degenfabian in #1066
- Process accuracy by @bryce13950 in #1067
- Ablation hugging face weights by @bryce13950 in #1070
- Ci fixes by @bryce13950 in #1072
- Revision extra forwards by @bryce13950 in #1073
- Test coverage by @bryce13950 in #1074
- Attention hooks full coverage for folding by @bryce13950 in #1078
- Ci job splitting by @bryce13950 in #1079
- fixed batch dimension by @bryce13950 in #1082
- fixed cache hooks by @bryce13950 in #1083
- fixed bias displaying by @bryce13950 in #1084
- fixed return type none by @bryce13950 in #1085
- Create pass through for hooks in compatibility mode by @bryce13950 in #1086
- fixed alias hook props by @bryce13950 in #1087
- made all hooks show properly by @bryce13950 in #1088
- updated loading in main demo to use transformers bridge by @bryce13950 in #1010
- switch from poetry to uv by @mivanit in #1037
- addded full kv cache by @bryce13950 in #1089
- Added full hook coverage for previous keys by @bryce13950 in #988
- updated loading in arena content demo to use transformer bridge by @degenfabian in #1012
- regeneerated with new hooks by @bryce13950 in #1091
- added test coverage for ensuring compatibility by @bryce13950 in #989
- Test hook shape coverage by @bryce13950 in #1000
- Hook compatibility by @bryce13950 in #1092
- Final compatibility coverage by @bryce13950 in #1090
- tested llama 3.1 by @bryce13950 in #1096
- fixed stop at layer by @bryce13950 in #1100
- Duplicate hook fix by @bryce13950 in #1098
- Gemma2 fix by @bryce13950 in #1099
- Fix gpt oss by @bryce13950 in #1101
- created benchmark suite by @bryce13950 in #1104
- finalized t5 adapter by @bryce13950 in #1095
- Model improvements by @bryce13950 in #1105
- decoupling weight processing completely from hooked transformer by @bryce13950 in #1103
- removed invalid comparison by @bryce13950 in #1107
- Revert "decoupling weight processing completely from hooked transformer" by @bryce13950 in #1108
- finalized bench mark logic by @bryce13950 in #1109
- Fix opt by @bryce13950 in #1106
- Benchmarking and compatibility only by @bryce13950 in #1112
- Decouple weight procesing by @bryce13950 in #1114
- optimized benchmarks a bit by @bryce13950 in #1115
- fixed tensor storing by @bryce13950 in #1116
- added skip condition by @bryce13950 in #1117
- Gpt2 weight match by @bryce13950 in #1118
- Gemma3 match by @bryce13950 in #1119
- setup real aliases by @bryce13950 in #1121
- Gpt oss match by @bryce13950 in #1120
- trimmed memory a bit by @bryce13950 in #1122
- created benchmark suite for unsupported models in hooked transformer by @bryce13950 in #1123
- fixed remaining gemma 3 benchmarks by @bryce13950 in #1124
- Gated MLP bridge by @bryce13950 in #1110
- setup brenchmark suite, and trimmed out extra tests by @bryce13950 in #1125
- Attention cleanup by @bryce13950 in #1126
- Benchmarking cross comparison revision by @bryce13950 in #1127
- Oss match by @bryce13950 in #1128
- Cleanup by @bryce13950 in #1129
- Weight processing generalization by @bryce13950 in #1131
- Processing cleanup by @bryce13950 in #1132
- Final cleanup by @bryce13950 in #1135
- Supported Architectures – code artifact cleanup by @jlarson4 in #1136
- Qwen3 adapter by @bryce13950 in #1138
- Model Bridge – Source Keys Cleanup by @jlarson4 in #1137
- cleaned up a lot of things by @bryce13950 in #1113
- Transformer bridge layer norm folding by @bryce13950 in #1071
- Updated release workflow by @bryce13950 in #1146
New Contributors
Full Changelog: v3.0.0a8...v3.0.0b1
v3.0.0a8
Another update that rounds out the API for our new module
What's Changed
- created new base config class by @bryce13950 in #1042
- made sure to check for nested hooks by @bryce13950 in #1035
- Fix warning for aliases when compatibility mode is turned off by @degenfabian in #1041
- Feature kv cache by @bryce13950 in #1045
- Split weights instead of logits for models with joint QKV matrix by @degenfabian in #1043
Full Changelog: v3.0.0a7...v3.0.0a8
v3.0.0a7
What's Changed
- map hook_pos_embed to rotary_emb, allow hook_aliases to be a list by @hijohnnylin in #1034
Full Changelog: v3.0.0a6...v3.0.0a7
v3.0.0a6
Big Release! A whole bunch of optimizations, and second passes on certain parts of the TransformerBridge to get us closer to beta.
What's Changed
- added setters and hook utils to bridge by @bryce13950 in #1009
- updated property access by @bryce13950 in #1026
- feat: Bridge.boot should allow using alias model names, but show a deprecation warning by @hijohnnylin in #1028
- Move QKV separation into bridge that wraps QKV matrix by @degenfabian in #1027
- removed unnecessary import by @bryce13950 in #1030
- Attn pattern shape by @bryce13950 in #1029
- added cache layer for hook collection by @bryce13950 in #1032
- Bridge unit test compatibility coverage by @bryce13950 in #1031
- updated loading in interactive neuroscope demo to use transformer bridge by @degenfabian in #1017
New Contributors
- @hijohnnylin made their first contribution in #1028
Full Changelog: v3.0.0a5...v3.0.0a6
v3.0.0a5
First new architecture for the TransformerBridge, and a whole lot closer to beta!
What's Changed
- Weight conversion renaming by @bryce13950 in #996
- Attention shape normalization by @bryce13950 in #997
- Joint hook handling by @bryce13950 in #1001
- Add compatibility_mode feature by @degenfabian in #998
- Add support for GPT-OSS by @degenfabian in #1004
- Fix GPT-OSS initialization error by @degenfabian in #1007
Full Changelog: v3.0.0a4...v3.0.0a5
v3.0.0a4
Big update that brings us a lot closer to beta! This update adds a compatibility layer for a lot of legacy properties of the old hooked root modules.
What's Changed
- Unified aliases by @bryce13950 in #991
- fixed hook alias positions by @bryce13950 in #992
- Create bridge for every module in Mixtral by @degenfabian in #984
- removed numpy ceiling by @bryce13950 in #994
- Ensure hook and property backwards compatibility with HookedTransformer by @degenfabian in #990
- Create bridge for every module in neox by @degenfabian in #995
- Create bridges for every module in neo by @degenfabian in #987
Full Changelog: v3.0.0a3...v3.0.0a4
v3.0.0a3
New Alpha release! A whole bunch of changes have been added. Some more HookedTransformer functionality has been imported, and a whole bunch of architectures have been improved to give more options in our new module. These changes have resulted in a very noticeable improvement with compatibility of old HookedTransformer based code.
What's Changed
- Setup deprecated hook aliases and got the majority of the main demo running properly by @bryce13950 in #976
- Linear test coverage by @bryce13950 in #977
- Create Bridge for every Gemma 3 module by @degenfabian in #966
- Add Bridges for every module in GPT2 by @degenfabian in #967
- Cache hook aliases & stop at layer by @bryce13950 in #978
- Create Bridges for every module in Bloom models by @degenfabian in #970
- Create Bridges for every module in Gemma 2 by @degenfabian in #971
- Create bridges for every module in Gemma 1 by @degenfabian in #972
- Create bridges for every module in Mistral by @degenfabian in #979
- Remove that output_attention flag defaults to true in boot function by @degenfabian in #982
- Create bridge for every module in GPT-J by @degenfabian in #974
- Create bridge for every module in Llama by @degenfabian in #975
Full Changelog: v3.0.0a2...v3.0.0a3
v3.0.0a2
This release is inconsequential. The first alpha release showed that the CI was not capable of publishing to pip with pep style alpha tags. This release makes that possible. Please consult the release notes for v3.0.0a1 for full information on 3.x alpha.
What's Changed
- Pre release version publishing by @bryce13950 in #973
Full Changelog: v3.0.0a1...v3.0.0a2