Implementation of Meta AI's LLaMA model with modifications for early-exiting output networks. The ultimate goal is to deploy a head-model on some edge device for quick inference. This particular implementation will be optimized to run on MPS. In future iterations where tail models are designed to run in the cloud should be modified for CUDA support.
-
Notifications
You must be signed in to change notification settings - Fork 0
chaseklvk/eellama
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Meta AI's LLaMA with early exiting
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published