quality: Optimize SSIM computation for ~2x performance improvement#4065
Open
cl445 wants to merge 1 commit intoopencv:4.xfrom
Open
quality: Optimize SSIM computation for ~2x performance improvement#4065cl445 wants to merge 1 commit intoopencv:4.xfrom
cl445 wants to merge 1 commit intoopencv:4.xfrom
Conversation
Key optimizations: - Use separable Gaussian filter (sepFilter2D) instead of 2D GaussianBlur Reduces complexity from O(k^2) to O(2k) for kernel size k - Use CV_32F instead of CV_64F precision for computations Faster on GPU while maintaining sufficient accuracy for SSIM - Add OpenCL kernel for fused SSIM formula computation Reduces multiple kernel launches to one for final SSIM calculation Performance improvement (Apple M4 Pro, median times): - 1080p grayscale: ~2x faster (was ~28ms, now ~14ms) - VGA grayscale: ~2x faster - Precomputed reference path: additional 25-30% faster Memory optimization: - Optional quality map allocation via need_quality_map parameter - Releases SSIM map immediately when not needed Also adds: - Performance benchmarks (perf_ssim.cpp) for regression testing - OpenCL kernel supporting 1-4 channel images
Author
|
The macOS-X64 CI failure appears to be unrelated to this PR:
Other recent PRs (e.g., #4057) also show macOS failures in different modules, suggesting infrastructure issues. Could a maintainer please re-run the failed CI job? Thank you! |
c7425ce to
2bb4e32
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR optimizes the SSIM (Structural Similarity Index) computation in the quality module, achieving approximately 2x performance improvement while also reducing memory allocations.
Related: #3797 - A previous PR proposed buffer reduction for SSIM. This new PR addresses similar memory optimization goals AND adds significant performance improvements through algorithmic changes.
Key Optimizations
Separable Gaussian Filter (
sepFilter2Dinstead ofGaussianBlur)CV_32F Precision (instead of CV_64F)
OpenCL Kernel for Fused SSIM Computation
ssim_mapkernel fuses the final SSIM formula into a single GPU kernelMemory Optimization
need_quality_mapparameter to avoid unnecessary allocationsPerformance Results
Tested on Apple M4 Pro (OpenCL), median times over 10 samples:
Precomputed reference (when using
QualitySSIM::create()+compute()):compute()methodChanges
qualityssim.hpp: Addedneed_quality_mapparameter to internal compute functionqualityssim.cpp: Refactored with optimized blur, OpenCL integration, and memory managementsrc/opencl/ssim.cl: New OpenCL kernel for SSIM map computationperf/perf_ssim.cpp: Performance benchmarks for regression testingPull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Test Plan
opencv_test_qualitySSIM tests passopencv_perf_quality)API Compatibility
This change is fully backward compatible: