Skip to content

quality: Optimize SSIM computation for ~2x performance improvement#4065

Open
cl445 wants to merge 1 commit intoopencv:4.xfrom
cl445:optimize-ssim-performance
Open

quality: Optimize SSIM computation for ~2x performance improvement#4065
cl445 wants to merge 1 commit intoopencv:4.xfrom
cl445:optimize-ssim-performance

Conversation

@cl445
Copy link

@cl445 cl445 commented Jan 8, 2026

Summary

This PR optimizes the SSIM (Structural Similarity Index) computation in the quality module, achieving approximately 2x performance improvement while also reducing memory allocations.

Related: #3797 - A previous PR proposed buffer reduction for SSIM. This new PR addresses similar memory optimization goals AND adds significant performance improvements through algorithmic changes.

Key Optimizations

  1. Separable Gaussian Filter (sepFilter2D instead of GaussianBlur)

    • Reduces computational complexity from O(k²) to O(2k) for kernel size k=11
    • Uses cached 1D kernel to avoid repeated allocation
  2. CV_32F Precision (instead of CV_64F)

    • Significantly faster on GPU/OpenCL
    • Maintains sufficient accuracy for SSIM computation (values range 0-1)
  3. OpenCL Kernel for Fused SSIM Computation

    • New ssim_map kernel fuses the final SSIM formula into a single GPU kernel
    • Supports 1-4 channel images via compile-time specialization
    • Reduces kernel launch overhead
  4. Memory Optimization

    • Added need_quality_map parameter to avoid unnecessary allocations
    • SSIM map is released immediately when not requested by caller

Performance Results

Tested on Apple M4 Pro (OpenCL), median times over 10 samples:

Image Size Channels Before After Speedup
VGA (640×480) 1 ~9ms 4.8ms 1.9x
VGA (640×480) 3 ~18ms 9.2ms 2.0x
720p (1280×720) 1 ~18ms 8.9ms 2.0x
720p (1280×720) 3 ~35ms 17.5ms 2.0x
1080p (1920×1080) 1 ~28ms 13.7ms 2.0x
1080p (1920×1080) 3 ~56ms 30.7ms 1.8x

Precomputed reference (when using QualitySSIM::create() + compute()):

  • Additional 25-30% faster than static compute() method
  • Recommended for comparing multiple images against a single reference

Changes

  • qualityssim.hpp: Added need_quality_map parameter to internal compute function
  • qualityssim.cpp: Refactored with optimized blur, OpenCL integration, and memory management
  • src/opencl/ssim.cl: New OpenCL kernel for SSIM map computation
  • perf/perf_ssim.cpp: Performance benchmarks for regression testing

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch (4.x)
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
  • The feature is well documented and sample code can be built with the project CMake

Test Plan

  • All existing opencv_test_quality SSIM tests pass
  • New performance benchmarks added (opencv_perf_quality)
  • Tested on macOS with Apple Silicon (OpenCL)
  • CI validation on other platforms

API Compatibility

This change is fully backward compatible:

  • No changes to public API signatures
  • Same SSIM values computed (within floating-point precision)
  • Only internal implementation optimized

Key optimizations:
- Use separable Gaussian filter (sepFilter2D) instead of 2D GaussianBlur
  Reduces complexity from O(k^2) to O(2k) for kernel size k
- Use CV_32F instead of CV_64F precision for computations
  Faster on GPU while maintaining sufficient accuracy for SSIM
- Add OpenCL kernel for fused SSIM formula computation
  Reduces multiple kernel launches to one for final SSIM calculation

Performance improvement (Apple M4 Pro, median times):
- 1080p grayscale: ~2x faster (was ~28ms, now ~14ms)
- VGA grayscale: ~2x faster
- Precomputed reference path: additional 25-30% faster

Memory optimization:
- Optional quality map allocation via need_quality_map parameter
- Releases SSIM map immediately when not needed

Also adds:
- Performance benchmarks (perf_ssim.cpp) for regression testing
- OpenCL kernel supporting 1-4 channel images
@cl445
Copy link
Author

cl445 commented Jan 12, 2026

The macOS-X64 CI failure appears to be unrelated to this PR:

  • Failed step: Performance:imgcodecs (exit code 139 = SIGSEGV)
  • This PR changes: modules/quality/ (SSIM optimization)

Other recent PRs (e.g., #4057) also show macOS failures in different modules, suggesting infrastructure issues.

Could a maintainer please re-run the failed CI job? Thank you!

@cl445 cl445 force-pushed the optimize-ssim-performance branch from c7425ce to 2bb4e32 Compare January 12, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant