nanowell

Follow

🎯

Focusing

nanowell nanowell

🎯

Focusing

Follow

optimizer.step() carefully

34 followers · 5 following

World

Achievements

Achievements

Highlights

Developer Program Member

Pinned Loading

Q-Sparse-LLM Q-Sparse-LLM Public

My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Python 30 1
AdEMAMix-Optimizer-Pytorch AdEMAMix-Optimizer-Pytorch Public

The AdEMAMix Optimizer: Better, Faster, Older.

Python 172 9
Differential-Transformer-PyTorch Differential-Transformer-PyTorch Public

PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture in…

Python 38 5
Brainstorm-science Brainstorm-science Public

Sample from uniform distribution towards automation of math.

C 149 21