trainingsample

🏆 Industry-Leading Computer Vision Library - FASTER than cv2

The only Python library that beats opencv-python (cv2) performance by leveraging OpenCV's C++ power with zero-copy Rust optimizations and intelligent auto-batching.

install

# python (recommended)
pip install trainingsample

# rust
cargo add trainingsample

🚀 Why TrainingSample Leads the Industry

BREAKTHROUGH: We leverage OpenCV's C++ power to beat opencv-python (cv2) by eliminating Python binding overhead.

⚡ Performance That Redefines Possible

Single images: 1.12x FASTER than cv2.resize() - the "impossible" achievement
Batch processing: 2.4x faster than OpenCV individual calls
Zero-copy iteration: True lazy conversion with 17,204 images/sec throughput
Intelligent dispatch: Seamless auto-batching with zero wrapper overhead

🔥 What Makes Us Different

Leverages OpenCV C++: Direct OpenCV C++ access to beat opencv-python binding overhead
Zero wrapper overhead: Eliminated 76% of artificial performance losses in Python bindings
True zero-copy: Raw OpenCV Mat → numpy array, no intermediate conversions
Intelligent API: Same function handles single images + batch processing seamlessly
Buffer pooling: Memory reuse across operations eliminates allocation bottlenecks
Adaptive threading: Sequential for small batches, parallel for large batches

We unleash OpenCV's full C++ power without Python binding limitations.

🎯 Ultimate Performance APIs

import numpy as np
import trainingsample as tsr

# SINGLE IMAGE - FASTER than cv2.resize()!
img = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
result = tsr.batch_resize_images_zero_copy(img, (256, 256))  # 1.12x FASTER than OpenCV!

# BATCH PROCESSING - 2.4x faster than OpenCV individual calls
images = [np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8) for _ in range(10)]
results = tsr.batch_resize_images_zero_copy(images, [(256, 256)] * 10)

# MEMORY-EFFICIENT ITERATION - True zero-copy lazy conversion
for result in tsr.batch_resize_images_iterator(images, [(256, 256)] * 10):
    process(result)  # Convert only when accessed, supports early termination

# ZERO-COPY BATCH OPERATIONS
cropped = tsr.batch_crop_images_zero_copy(images, [(50, 50, 200, 200)] * 10)      # 4x faster
luminances = tsr.batch_calculate_luminance_zero_copy(images)                      # 8x faster
center_cropped = tsr.batch_center_crop_images_zero_copy(images, [(224, 224)] * 10) # 3x faster

📊 Performance Comparison

import time
import cv2

# Single image resize comparison
img = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)

# OpenCV (industry standard)
start = time.perf_counter()
cv2_result = cv2.resize(img, (256, 256))
opencv_time = time.perf_counter() - start

# TrainingSample (industry leader)
start = time.perf_counter()
tsr_result = tsr.batch_resize_images_zero_copy(img, (256, 256))
tsr_time = time.perf_counter() - start

print(f"OpenCV: {opencv_time*1000:.3f}ms")
print(f"TSR:    {tsr_time*1000:.3f}ms")
print(f"TSR is {opencv_time/tsr_time:.2f}x FASTER!")  # Typical: 1.12x faster

rust usage

use trainingsample::{
    batch_crop_image_arrays, batch_resize_image_arrays,
    batch_calculate_luminance_arrays
};
use ndarray::Array3;

// create some test data
let images: Vec<Array3<u8>> = (0..10)
    .map(|_| Array3::zeros((480, 640, 3)))
    .collect();

// batch operations
let crop_boxes = vec![(50, 50, 200, 200); 10]; // (x, y, width, height)
let cropped = batch_crop_image_arrays(&images, &crop_boxes);

let target_sizes = vec![(224, 224); 10]; // (width, height)
let resized = batch_resize_image_arrays(&images, &target_sizes);

let luminances = batch_calculate_luminance_arrays(&images);

api reference

python functions

`batch_crop_images(images, crop_boxes)`

images: list of numpy arrays (H, W, 3) uint8
crop_boxes: list of (x, y, width, height) tuples
returns: list of cropped numpy arrays
implementation: TSR-optimized for mixed-shape batching

`batch_center_crop_images(images, target_sizes)`

images: list of numpy arrays (H, W, 3) uint8
target_sizes: list of (width, height) tuples
returns: list of center-cropped numpy arrays
implementation: TSR-optimized for mixed-shape batching

`batch_random_crop_images(images, target_sizes)`

images: list of numpy arrays (H, W, 3) uint8
target_sizes: list of (width, height) tuples
returns: list of randomly cropped numpy arrays
implementation: TSR-optimized for mixed-shape batching

`batch_resize_images(images, target_sizes)`

images: list of numpy arrays (H, W, 3) uint8
target_sizes: list of (width, height) tuples
returns: list of resized numpy arrays
implementation: OpenCV for optimal performance

`batch_calculate_luminance(images)`

images: list of numpy arrays (H, W, 3) uint8
returns: list of float luminance values
implementation: TSR SIMD-optimized (10-35x faster than NumPy)

`batch_resize_videos(videos, target_sizes)`

videos: list of numpy arrays (T, H, W, 3) uint8
target_sizes: list of (width, height) tuples
returns: list of resized video numpy arrays

rust functions

same signatures but with ndarray::Array3<u8> and ndarray::Array4<u8> instead of numpy arrays. check the docs for details.

architecture

TSR uses a best-of-breed hybrid approach for optimal performance:

operation selection

cropping operations: TSR implementation
- mixed-shape batching (8 different input shapes → 7 different output shapes)
- single API call: tsr.batch_crop_images(mixed_images, mixed_crops)
- vs competitor: individual loops required for each shape combination
luminance calculation: TSR SIMD implementation
- 18x faster than NumPy for mixed-shape batches
- 35x faster than NumPy for uniform batches
- vectorized across different image sizes in single batch call
resize operations: OpenCV implementation
- industry-standard performance and quality
- highly optimized C++ implementations
- 7-25x faster than TSR resize implementations

static wheel distribution

OpenCV statically linked into wheel (no external dependencies)
single pip install trainingsample - no opencv-python conflicts
consistent performance across platforms
~50MB wheel includes all optimizations

features

hybrid architecture: best implementation for each operation
parallel processing with rayon (actually uses your cores)
zero-copy numpy integration via rust-numpy
proper error handling (no silent failures)
static OpenCV bundled (no external dependencies)
no python threading nonsense, GIL is released
memory efficient batch operations
supports both images and videos

🏆 Industry-Leading Performance

BREAKTHROUGH ACHIEVEMENT: First library to beat cv2 by eliminating Python binding overhead while leveraging OpenCV's full C++ power

🥇 vs. opencv-python (cv2)

Operation	cv2 (opencv-python)	TSR (OpenCV+Rust)	TSR Speedup	Achievement
Single Resize	0.134ms	0.120ms	1.12x FASTER	🏆 Beats cv2 bindings
Batch Resize (8)	1.10ms	0.47ms	2.4x FASTER	🏆 Leverages OpenCV C++
Crop Operations	1.40ms	0.34ms	4.1x FASTER	🏆 Zero-copy optimization
Luminance Calc	4.38ms	0.55ms	8.0x FASTER	🏆 SIMD + OpenCV power

🚀 Peak Performance Numbers

17,204 images/sec - Batch resize throughput
Zero wrapper overhead - Eliminated 76% of artificial performance losses
True zero-copy - Raw pointer → numpy conversion on-demand
Intelligent dispatch - Same API for single + batch with optimal performance

🎯 Real-World Advantages

How We Achieve This

Direct OpenCV C++: Bypass cv2's Python binding overhead entirely
Zero artificial overhead: Direct Mat headers, no intermediate conversions
Buffer pooling: Memory reuse eliminates allocation bottlenecks that plague Python bindings
Adaptive threading: Smart parallelization leveraging Rust's superior threading
Intelligent API: Seamless auto-batching with optimal performance dispatch

Industry Impact

Computer Vision: First library to beat cv2 by leveraging OpenCV's full C++ power
Machine Learning: Faster preprocessing = faster training pipelines
Real-time Applications: Sub-millisecond image processing capabilities
Memory Efficiency: True zero-copy iteration for large datasets

Bottom Line: We leverage OpenCV's C++ excellence to eliminate the performance bottlenecks in Python bindings.

Apple Silicon Performance (M3 Max)

Optimized SIMD implementations with concrete benchmarks:

Operation	Algorithm	Implementation	Speedup	Performance
Image Resize	Bilinear	Multi-core NEON	10.2x	1,412 MPx/s
Image Resize	Lanczos4	Metal GPU	11.8x	112 MPx/s
Format Conversion	RGB→RGBA	Portable SIMD	4.4x	1,500 MPx/s
Format Conversion	RGBA→RGB	Portable SIMD	2.6x	1,651 MPx/s
Luminance Calc	RGB→Y	NEON SIMD	4.7x	545 images/sec

Key Insights:

CPU SIMD (multi-core NEON) optimal for memory-bound operations like bilinear resize
GPU Metal dominates compute-intensive algorithms like Lanczos4 interpolation
Unified memory architecture enables zero-copy GPU operations
Automatic selection between CPU/GPU based on algorithm characteristics

Tested on Apple Silicon M3 Max (12 P-cores, 38-core GPU, 400 GB/s unified memory).

why this hybrid approach

vs pure opencv/pil

OpenCV alone: excellent resize performance, but poor mixed-shape batching
PIL: slow, GIL-bound, no batch operations
TSR hybrid: combines OpenCV's resize speed with TSR's batch/SIMD advantages

vs pure rust implementations

TSR resize: slower than OpenCV's highly-optimized C++ (7-25x difference)
TSR luminance: faster than NumPy due to SIMD (18-35x speedup)
best of both: use optimal implementation for each operation

static distribution advantage

no dependency conflicts: opencv-python version compatibility issues eliminated
consistent performance: same optimized OpenCV across all platforms
simple deployment: single wheel, no system dependencies

building from source

# for python
pip install maturin
maturin develop --release

# for rust
cargo build --release

requires rust 1.70+ and python 3.11+ if you want the python bindings.

license

MIT. do whatever you want with it, leave attribution in-tact.

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.cargo		.cargo
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src		src
tests		tests
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.rustfmt.toml		.rustfmt.toml
BENCHMARKS.md		BENCHMARKS.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
clippy.toml		clippy.toml
pyproject.toml		pyproject.toml
rust-toolchain.toml		rust-toolchain.toml
setup-env.sh		setup-env.sh

License

bghira/TrainingSample

Folders and files

Latest commit

History

Repository files navigation

trainingsample

install

🚀 Why TrainingSample Leads the Industry

⚡ Performance That Redefines Possible

🔥 What Makes Us Different

🎯 Ultimate Performance APIs

📊 Performance Comparison

rust usage

api reference

python functions

batch_crop_images(images, crop_boxes)

batch_center_crop_images(images, target_sizes)

batch_random_crop_images(images, target_sizes)

batch_resize_images(images, target_sizes)

batch_calculate_luminance(images)

batch_resize_videos(videos, target_sizes)