Skip to content

Latest commit

 

History

History
3 lines (3 loc) · 366 Bytes

README.md

File metadata and controls

3 lines (3 loc) · 366 Bytes

eeLLaMA


Implementation of Meta AI's LLaMA model with modifications for early-exiting output networks. The ultimate goal is to deploy a head-model on some edge device for quick inference. This particular implementation will be optimized to run on MPS. In future iterations where tail models are designed to run in the cloud should be modified for CUDA support.