πΊ Rust Inference engine for Visual Language Models
A high-performance Rust library for running inference on AI models with built-in telemetry and production-ready features. Kornia Infernum provides a flexible, threaded inference engine that decouples model implementation from API design while delivering rich monitoring capabilities.
- π Flexible Model Integration - Support for any model through trait-based design
- β‘ Asynchronous Processing - Non-blocking inference with background thread execution
- π Built-in Telemetry - Request tracking, timing, and metadata collection
- π― Production Ready - State management, error handling, and monitoring
- πͺΆ Lightweight Metadata - Avoid cloning heavy data while preserving essential information
- π§ Type-Safe API - Fully generic with compile-time guarantees
Add Kornia Infernum to your Cargo.toml
:
[dependencies]
kornia-infernum = "0.1.0"
use kornia_infernum::{InfernumModel, RequestMetadata};
use kornia_image::{Image, ImageSize, allocator::CpuAllocator};
// Define your request and response types
#[derive(Clone)]
struct MyRequest {
image: Image<u8, 3, CpuAllocator>,
prompt: String,
}
#[derive(Clone)]
struct MyResponse {
result: String,
}
// Define lightweight metadata to avoid cloning heavy data
#[derive(Clone)]
struct MyMetadata {
prompt: String,
image_size: ImageSize,
}
impl RequestMetadata for MyRequest {
type Metadata = MyMetadata;
fn metadata(&self) -> Self::Metadata {
MyMetadata {
prompt: self.prompt.clone(),
image_size: self.image.size(), // Only size, not the full image
}
}
}
// Implement your model
struct MyModel;
impl InfernumModel for MyModel {
type Request = MyRequest;
type Response = MyResponse;
type Error = Box<dyn std::error::Error + Send + Sync>;
fn run(&mut self, request: Self::Request) -> Result<Self::Response, Self::Error> {
// Your inference logic here
Ok(MyResponse {
result: format!("Processed: {}", request.prompt),
})
}
}
use kornia_infernum::{InfernumEngine, InfernumEngineResult};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize the engine with your model
let engine = InfernumEngine::new(MyModel);
// Load an image
let image = kornia_io::jpeg::read_image_jpeg_rgb8("path/to/image.jpg")?;
// Create a request
let request = MyRequest {
image,
prompt: "Describe this image".to_string(),
};
// Schedule inference (non-blocking)
engine.schedule_inference(request);
// Poll for results
loop {
match engine.try_poll_response() {
InfernumEngineResult::Success(response) => {
println!("Response: {}", response.response.result);
println!("Duration: {:?}", response.duration);
println!("Original prompt: {}", response.request_metadata.prompt);
break;
}
InfernumEngineResult::Empty(state) => {
println!("Engine state: {}", state.as_str());
std::thread::sleep(std::time::Duration::from_millis(100));
}
InfernumEngineResult::Error(e) => {
eprintln!("Error: {}", e);
break;
}
}
}
Ok(())
}
Kornia Infernum includes a production-ready HTTP server using PaliGemma:
cargo run --example infernum --features cuda # With CUDA support
# or
cargo run --example infernum # CPU only
The server provides REST endpoints:
- π€
POST /inference
- Submit inference requests - π₯
GET /results
- Retrieve results with telemetry
You can interact with the server using the included client:
# Submit an inference request
cargo run --example client -- inference \
--image-path path/to/your/image.jpg \
--prompt "What do you see in this image?"
# Check for results
cargo run --example client -- results
Or use curl directly:
# Submit an inference request
curl -X POST http://localhost:3000/inference \
-H "Content-Type: application/json" \
-d '{
"image_path": "path/to/your/image.jpg",
"prompt": "What do you see in this image?"
}'
# Check for results
curl http://localhost:3000/results
Example request:
{
"image_path": "path/to/image.jpg",
"prompt": "What do you see in this image?"
}
Example response:
{
"status": "success",
"response": {
"prompt": "What do you see in this image?",
"start_time": 1234567890,
"duration": "250ms",
"response": "I can see a beautiful landscape with mountains..."
}
}
InfernumModel
- Trait for implementing custom modelsRequestMetadata
- Trait for extracting lightweight telemetry dataInfernumEngine
- High-performance inference engine with background processingInfernumEngineResponse
- Rich response with telemetry and original request metadata
- Performance First - Avoid unnecessary cloning of heavy data like images
- Type Safety - Fully generic design with compile-time guarantees
- Production Ready - Built-in monitoring, error handling, and state management
- Flexibility - Support any model through trait-based design
- π¦ Rust 2024 edition
- Optional: CUDA support for GPU acceleration
- β±οΈ Timing - Precise inference duration tracking
- π Request IDs - Unique tracking for each inference
- π Metadata - Lightweight request information without heavy data
- π State Management - Real-time engine state monitoring
Licensed under the Apache License, Version 2.0.
- Part of the Kornia ecosystem for computer vision in Rust
- Designed for production AI workloads with performance and monitoring in mind