Skip to content
/ usls Public

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models such as YOLO, FastVLM, and more.

License

Notifications You must be signed in to change notification settings

jamjamjon/usls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

usls

Rust CI Crates.io Version ONNXRuntime MSRV Rust MSRV

usls is a cross-platform Rust library powered by ONNX Runtime for efficient inference of SOTA vision and vision-language models (typically under 1B parameters).

πŸ“š Documentation

πŸš€ Quick Start

Run the YOLO demo to explore various YOLO-Series models with different tasks, precision, and execution providers:

  • Tasks: detect, segment, pose, classify, obb
  • Versions: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLOv12, YOLOv13
  • Scales: n, s, m, l, x
  • Precision: fp32, fp16, q8, q4, q4f16, bnb4
  • Execution Providers: CPU, CUDA, TensorRT, CoreML, OpenVINO, and more
# CPU: Object detection, YOLOv8n, FP16
cargo run -r --example yolo -- --task detect --ver 8 --scale n --dtype fp16

# NVIDIA CUDA: Instance segmentation, YOLO11m
cargo run -r -F cuda --example yolo -- --task segment --ver 11 --scale m --device cuda:0

# NVIDIA TensorRT
cargo run -r -F tensorrt --example yolo -- --device tensorrt:0

# Apple Silicon CoreML
cargo run -r -F coreml --example yolo -- --device coreml

# Intel OpenVINO: CPU/GPU/VPU acceleration
cargo run -r -F openvino -F ort-load-dynamic --example yolo -- --device openvino:CPU

# Show all available options
cargo run -r --example yolo -- --help

See YOLO Examples for more details and use cases.

βš™οΈ Installation

Add the following to your Cargo.toml:

[dependencies]
# Use GitHub version
usls = { git = "https://github.com/jamjamjon/usls", features = [ "cuda" ] }

# Alternative: Use crates.io version
usls = { version = "latest-version", features = [ "cuda" ] }

πŸ“¦ Cargo Features

❕ Features in italics are enabled by default.

  • Runtime & Utilities

    • ort-download-binaries: Auto-download ONNX Runtime binaries from pyke.
    • ort-load-dynamic: Linking ONNX Runtime by your self. Use this if pyke doesn't provide prebuilt binaries for your platform or you want to link your local ONNX Runtime library. See Linking Guide for more details.
    • viewer: Image/video visualization (minifb). Similar to OpenCV imshow(). See example.
    • video: Video I/O support (video-rs). Enable this to read/write video streams. See example
    • hf-hub: Hugging Face Hub support for downloading models from Hugging Face repositories.
    • tokenizers: Tokenizer support for vision-language models. Automatically enabled when using vision-language model features (blip, clip, florence2, grounding-dino, fastvlm, moondream2, owl, smolvlm, trocr, yoloe).
    • slsl: SLSL tensor library support. Automatically enabled when using yolo or clip features.
  • Execution Providers

    Hardware acceleration for inference.

    • cuda, tensorrt: NVIDIA GPU acceleration
    • coreml: Apple Silicon acceleration
    • openvino: Intel CPU/GPU/VPU acceleration
    • onednn, directml, xnnpack, rocm, cann, rknpu, acl, nnapi, armnn, tvm, qnn, migraphx, vitis, azure: Various hardware/platform support

    See ONNX Runtime docs and ORT performance guide for details.

  • Model Selection

    Almost each model is a separate feature. Enable only what you need to reduce compile time and binary size.

    • yolo, sam, clip, image-classifier, dino, rtmpose, rtdetr, db, ...
    • All models: all-models (enables all model features)

    See Supported Models for the complete list with feature names.

⚑ Supported Models

πŸ‘€ View all models (Click to expand)
Model Task / Description Feature Example
BEiT Image Classification image-classifier demo
ConvNeXt Image Classification image-classifier demo
FastViT Image Classification image-classifier demo
MobileOne Image Classification image-classifier demo
DeiT Image Classification image-classifier demo
DINOv2 Vision Embedding dino demo
DINOv3 Vision Embedding dino demo
YOLOv5 Image Classification
Object Detection
Instance Segmentation
yolo demo
YOLOv6 Object Detection yolo demo
YOLOv7 Object Detection yolo demo
YOLOv8
YOLO11
Object Detection
Instance Segmentation
Image Classification
Oriented Object Detection
Keypoint Detection
yolo demo
YOLOv9 Object Detection yolo demo
YOLOv10 Object Detection yolo demo
YOLOv12 Image Classification
Object Detection
Instance Segmentation
yolo demo
YOLOv13 Object Detection yolo demo
RT-DETR Object Detection rtdetr demo
RF-DETR Object Detection rfdetr demo
PP-PicoDet Object Detection picodet demo
DocLayout-YOLO Object Detection picodet demo
D-FINE Object Detection rtdetr demo
DEIM Object Detection rtdetr demo
DEIMv2 Object Detection rtdetr demo
RTMPose Keypoint Detection rtmpose demo
DWPose Keypoint Detection rtmpose demo
RTMW Keypoint Detection rtmpose demo
RTMO Keypoint Detection rtmo demo
SAM Segment Anything sam demo
SAM2 Segment Anything sam demo
MobileSAM Segment Anything sam demo
EdgeSAM Segment Anything sam demo
SAM-HQ Segment Anything sam demo
FastSAM Instance Segmentation yolo demo
YOLO-World Open-Set Detection With Language yolo demo
YOLOE Open-Set Detection And Segmentation yoloe demo-prompt-free
demo-prompt(visual & textual)
GroundingDINO Open-Set Detection With Language grounding-dino demo
CLIP Vision-Language Embedding clip demo
jina-clip-v1 Vision-Language Embedding clip demo
jina-clip-v2 Vision-Language Embedding clip demo
mobileclip & mobileclip2 Vision-Language Embedding clip demo
BLIP Image Captioning blip demo
DB(PaddleOCR-Det) Text Detection db demo
FAST Text Detection db demo
LinkNet Text Detection db demo
SVTR(PaddleOCR-Rec) Text Recognition svtr demo
SLANet Tabel Recognition slanet demo
TrOCR Text Recognition trocr demo
YOLOPv2 Panoptic Driving Perception yolop demo
DepthAnything v1
DepthAnything v2
Monocular Depth Estimation depth-anything demo
DepthPro Monocular Depth Estimation depth-pro demo
MODNet Image Matting modnet demo
Sapiens Foundation for Human Vision Models sapiens demo
Florence2 A Variety of Vision Tasks florence2 demo
Moondream2 Open-Set Object Detection
Open-Set Keypoints Detection
Image Caption
Visual Question Answering
moondream2 demo
OWLv2 Open-Set Object Detection owl demo
SmolVLM(256M, 500M) Visual Question Answering smolvlm demo
FastVLM(0.5B) Vision Language Models fastvlm demo
RMBG(1.4, 2.0) Image Segmentation
Background Removal
rmbg demo
BEN2 Image Segmentation
Background Removal
ben2 demo
MediaPipe: Selfie-segmentation Image Segmentation mediapipe-segmenter demo
Swin2SR Image Super-Resolution and Restoration swin2sr demo
APISR Real-World Anime Super-Resolution apisr demo
RAM & RAM++ Image Tagging ram demo

❓ FAQ

See issues or open a new discussion.

🀝 Contributing

Contributions are welcome! If you have suggestions, bug reports, or want to add new features or models, feel free to open an issue or submit a pull request.

πŸ™ Acknowledgments

This project is built on top of ort (ONNX Runtime for Rust), which provides seamless Rust bindings for ONNX Runtime. Special thanks to the ort maintainers.

Thanks to all the open-source libraries and their maintainers that make this project possible. See Cargo.toml for a complete list of dependencies.

πŸ“œ License

This project is licensed under LICENSE.

About

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models such as YOLO, FastVLM, and more.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 11

Languages