🎯
Focusing
Highlights
Pinned Loading
-
Q-Sparse-LLM
Q-Sparse-LLM PublicMy Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
-
AdEMAMix-Optimizer-Pytorch
AdEMAMix-Optimizer-Pytorch PublicThe AdEMAMix Optimizer: Better, Faster, Older.
-
Differential-Transformer-PyTorch
Differential-Transformer-PyTorch PublicPyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture in…
-
Brainstorm-science
Brainstorm-science PublicSample from uniform distribution towards automation of math.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.