A Rust client library for llama.cpp's OpenAI-compatible API server.
- π Async/await support with Tokio
- π¬ Chat completions (streaming and non-streaming)
- π Text completions
- π’ Embeddings generation
- π API key authentication support
- π― Type-safe request/response handling
- π οΈ Builder pattern for easy request construction
Add this to your Cargo.toml:
[dependencies]
lancor = "0.1.0"
tokio = { version = "1.0", features = ["full"] }use lancor::{LlamaCppClient, ChatCompletionRequest, Message};
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()> {
// Create a client
let client = LlamaCppClient::new("http://localhost:8080")?;
// Build a chat completion request
let request = ChatCompletionRequest::new("your-model-name")
.message(Message::system("You are a helpful assistant."))
.message(Message::user("What is Rust?"))
.max_tokens(100);
// Send the request
let response = client.chat_completion(request).await?;
println!("{}", response.choices[0].message.content);
Ok(())
}use lancor::{LlamaCppClient, ChatCompletionRequest, Message};
let client = LlamaCppClient::new("http://localhost:8080")?;
let request = ChatCompletionRequest::new("model-name")
.message(Message::system("You are a helpful assistant."))
.message(Message::user("Explain quantum computing"))
.temperature(0.7)
.max_tokens(200);
let response = client.chat_completion(request).await?;
println!("{}", response.choices[0].message.content);use lancor::{LlamaCppClient, ChatCompletionRequest, Message};
use futures::stream::StreamExt;
let client = LlamaCppClient::new("http://localhost:8080")?;
let request = ChatCompletionRequest::new("model-name")
.message(Message::user("Write a short poem"))
.stream(true)
.max_tokens(100);
let mut stream = client.chat_completion_stream(request).await?;
while let Some(chunk_result) = stream.next().await {
if let Ok(chunk) = chunk_result {
if let Some(content) = &chunk.choices[0].delta.content {
print!("{}", content);
}
}
}use lancor::{LlamaCppClient, CompletionRequest};
let client = LlamaCppClient::new("http://localhost:8080")?;
let request = CompletionRequest::new("model-name", "Once upon a time")
.max_tokens(50)
.temperature(0.8);
let response = client.completion(request).await?;
println!("{}", response.content);use lancor::{LlamaCppClient, EmbeddingRequest};
let client = LlamaCppClient::new("http://localhost:8080")?;
let request = EmbeddingRequest::new("model-name", "Hello, world!");
let response = client.embedding(request).await?;
let embedding_vector = &response.data[0].embedding;
println!("Embedding dimension: {}", embedding_vector.len());use lancor::LlamaCppClient;
// With API key
let client = LlamaCppClient::with_api_key(
"http://localhost:8080",
"your-api-key"
)?;The main client for interacting with llama.cpp server.
new(base_url)- Create a new clientwith_api_key(base_url, api_key)- Create a client with API key authenticationdefault()- Create a client connecting tohttp://localhost:8080chat_completion(request)- Send a chat completion requestchat_completion_stream(request)- Send a streaming chat completion requestcompletion(request)- Send a text completion requestembedding(request)- Send an embedding request
All request types support a fluent builder pattern:
ChatCompletionRequest::new("model")
.message(Message::user("Hello"))
.temperature(0.7)
.max_tokens(100)
.top_p(0.9)
.stream(true);- Rust 1.70 or later
- A running llama.cpp server with OpenAI-compatible API enabled
To use this client, you need to run llama.cpp with the --api-key flag (optional) and ensure the OpenAI-compatible endpoints are enabled:
./server -m your-model.gguf --port 8080Check out the examples directory for more usage examples:
cargo run --example basic_usageThis project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- llama.cpp - The amazing llama.cpp project
- OpenAI API - API specification reference