./models: contains your rkllm models../lib: C++rkllmlibrary used for inference andfix_freqence_platform../app.py: API Rest server../client.py: Client to interact with the server.
- Python 3.8 to 3.12
- Running models on NPU.
- Pull models directly from Huggingface
- include a API REST with documentation
- Listing available models.
- Dynamic loading and unloading of models.
- Inference requests.
- Streaming and non-streaming modes.
- Message history.
- Download RKLLama:
git clone
cd rkllama- Install RKLLama
chmod +x setup.sh
sudo ./setup.shVirtualization with conda is started automatically, as well as the NPU frequency setting.
- Start the server
rkllama serve- Command to start the client
rkllamaor
rkllama help- See the available models
rkllama list- Run a model
rkllama run <model_name>You can download and install a model from the Hugging Face platform with the following command:
rkllama pull username/repo_id/model_file.rkllmAlternatively, you can run the command interactively:
rkllama pull
Repo ID ( example: punchnox/Tinnyllama-1.1B-rk3588-rkllm-1.1.4): <your response>
File ( example: TinyLlama-1.1B-Chat-v1.0-rk3588-w8a8-opt-0-hybrid-ratio-0.5.rkllm): <your response>This will automatically download the specified model file and prepare it for use with RKLLAMA.
-
Download the Model
- Download
.rkllmmodels directly from Hugging Face.
- Download
-
Place the Model
- Navigate to the
~/RKLLAMA/modelsdirectory on your system. - Place the
.rkllmfiles in this directory.
Example directory structure:
~/RKLLAMA/models/ └── TinyLlama-1.1B-Chat-v1.0.rkllm - Navigate to the
- Go to the
~/RKLLAMA/foldercd ~/RKLLAMA/ cp ./uninstall.sh ../ cd ../ && chmod +x ./uninstall.sh && ./uninstall.sh
- Add multimodal models
- Add embedding models
- Add RKNN for onnx models ( TTS, image classification/segmentation... )
GGUF/HF to RKLLMconversion software