Skip to content

v0.1.0a3

Pre-release
Pre-release

Choose a tag to compare

@talmo talmo released this 19 Jan 05:24
· 15 commits to main since this release
df72b51

Summary

This pre-release adds powerful new capabilities for high-performance inference and post-processing:

  • ONNX/TensorRT Export: Export trained models to optimized formats for 3-6x faster inference
  • Post-Inference Filtering: Remove overlapping/duplicate predictions using IOU or OKS similarity
  • Improved WandB Logging: Better metrics organization and run naming

For the full list of major features, breaking changes, and improvements introduced in the v0.1.0 series, see the v0.1.0a0 release notes.


What's New in v0.1.0a3

Features

ONNX/TensorRT Export Module (#418)

A complete model export system for high-performance inference:

# Export to ONNX
sleap-nn export /path/to/model -o exports/my_model --format onnx

# Export to both ONNX and TensorRT FP16
sleap-nn export /path/to/model -o exports/my_model --format both

# Run inference on exported model
sleap-nn predict exports/my_model video.mp4 -o predictions.slp

Performance Benchmarks (NVIDIA RTX A6000):

Batch size 1 (latency-optimized):

Model Resolution PyTorch ONNX-GPU TensorRT FP16 Speedup
single_instance 192×192 1.8 ms 1.3 ms 0.31 ms 5.9x
centroid 1024×1024 2.5 ms 2.7 ms 0.77 ms 3.2x
topdown 1024×1024 11.4 ms 9.7 ms 2.31 ms 4.9x
bottomup 1024×1280 12.3 ms 9.6 ms 2.52 ms 4.9x
multiclass_topdown 1024×1024 8.3 ms 9.1 ms 1.84 ms 4.5x
multiclass_bottomup 1024×1024 9.4 ms 9.4 ms 2.64 ms 3.6x

Batch size 8 (throughput-optimized):

Model Resolution PyTorch ONNX-GPU TensorRT FP16 Speedup
single_instance 192×192 3,111 FPS 3,165 FPS 11,039 FPS 3.5x
centroid 1024×1024 453 FPS 474 FPS 1,829 FPS 4.0x
topdown 1024×1024 94 FPS 122 FPS 525 FPS 5.6x
bottomup 1024×1280 113 FPS 121 FPS 524 FPS 4.6x
multiclass_topdown 1024×1024 127 FPS 145 FPS 735 FPS 5.8x
multiclass_bottomup 1024×1024 116 FPS 120 FPS 470 FPS 4.1x

Speedup is relative to PyTorch baseline.

Supported model types:

  • Single Instance, Centroid, Centered Instance
  • Top-Down (combined centroid + instance)
  • Bottom-Up (multi-instance with PAF grouping)
  • Multi-class Top-Down and Bottom-Up (with identity classification)

New CLI commands:

  • sleap-nn export - Export models to ONNX/TensorRT
  • sleap-nn predict - Run inference on exported models

New optional dependencies:

uv pip install "sleap-nn[export]"      # ONNX CPU inference
uv pip install "sleap-nn[export-gpu]"  # ONNX GPU inference
uv pip install "sleap-nn[tensorrt]"    # TensorRT support

See the Export Guide for full documentation.

Post-Inference Filtering for Overlapping Instances (#420)

New capability to remove duplicate/overlapping pose predictions after model inference:

# Filter with IOU method (default)
sleap-nn track -i video.mp4 -m model/ --filter_overlapping

# Use OKS method with custom threshold
sleap-nn track -i video.mp4 -m model/ \
    --filter_overlapping \
    --filter_overlapping_method oks \
    --filter_overlapping_threshold 0.5

New CLI options for sleap-nn track:

Option Default Description
--filter_overlapping False Enable filtering using greedy NMS
--filter_overlapping_method iou Similarity method: iou (bbox) or oks (keypoints)
--filter_overlapping_threshold 0.8 Similarity threshold (lower = more aggressive)

Programmatic API:

from sleap_nn.inference.postprocessing import filter_overlapping_instances

labels = filter_overlapping_instances(labels, threshold=0.5, method="oks")

Why use this? Previously, IOU-based filtering only existed in the tracking pipeline. This feature allows filtering overlapping predictions without requiring --tracking.

Improvements

WandB Run Naming and Metrics Logging (#417)

  • Fixed run naming: WandB runs now correctly use auto-generated run names
  • Improved metrics organization: All metrics use / separator for automatic panel grouping in WandB UI:
    • train/loss, train/lr - Training metrics (epoch x-axis)
    • val/loss - Validation metrics (epoch x-axis)
    • eval/val/ - Epoch-end evaluation metrics
    • eval/test.X/ - Post-training test set metrics
  • New metrics logged:
    • train/lr - Learning rate (useful for monitoring LR schedulers)
    • PCK@5, PCK@10 - PCK at 5px and 10px thresholds
    • distance/p95, distance/p99 - Additional distance percentiles

Documentation

  • Exporting Guide (#419): Added comprehensive export documentation to How-to guides navigation

Installation

This is an alpha pre-release. Pre-releases are excluded by default per PEP 440 - you must explicitly opt in.

Install with uv (Recommended)

# With --prerelease flag (requires uv 0.9.20+)
uv tool install sleap-nn[torch] --torch-backend auto --prerelease=allow

# Or pin to exact version
uv tool install "sleap-nn[torch]==0.1.0a3" --torch-backend auto

Run with uvx (One-off execution)

uvx --from "sleap-nn[torch]" --prerelease=allow --torch-backend auto sleap-nn system

Verify Installation

sleap-nn --version
# Expected output: 0.1.0a3

sleap-nn system
# Shows full system diagnostics including GPU info

Upgrading from v0.1.0a2

If you already have v0.1.0a2 installed with --prerelease=allow:

# Simple upgrade (retains original settings like --prerelease=allow)
uv tool upgrade sleap-nn

To force a complete reinstall:

uv tool install sleap-nn[torch] --torch-backend auto --prerelease=allow --force

Changelog

PR Category Title
#417 Improvement Fix wandb run naming and improve metrics logging
#418 Feature Add ONNX/TensorRT export module
#419 Documentation Add Exporting guide to How-to guides section
#420 Feature Add post-inference filtering for overlapping instances

Full Changelog: v0.1.0a2...v0.1.0a3