Skip to content

Gaussian blur #2496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Gaussian blur #2496

wants to merge 11 commits into from

Conversation

awxkee
Copy link
Contributor

@awxkee awxkee commented Jun 18, 2025

Closes #986

Benchmarks

My working laptop has 10-15% noise, so numbers just to understand the magnitude of the difference. As well as comparison to libblur, because it actually is completely different execution class.

fast blur: sigma 3.0    time:   [22.242 ms 22.345 ms 22.460 ms]
                        change: [-6.7313% -5.5870% -4.4410%] (p = 0.00 < 0.05)

fast blur: sigma 7.0    time:   [22.404 ms 22.427 ms 22.453 ms]
                        change: [-9.3688% -8.5638% -7.7808%] (p = 0.00 < 0.05)

fast blur: sigma 50.0   time:   [23.251 ms 23.279 ms 23.313 ms]

gaussian blur: sigma 3.0
                        time:   [8.8081 ms 8.8672 ms 8.9305 ms]

gaussian blur: sigma 7.0
                        time:   [20.795 ms 20.959 ms 21.141 ms]

gaussian blur: sigma 50.0
                        time:   [174.65 ms 175.70 ms 176.80 ms]

libblur gaussian blur: sigma 3.0
                        time:   [1.5107 ms 1.5178 ms 1.5262 ms]

libblur gaussian blur: sigma 7.0
                        time:   [3.2371 ms 3.2688 ms 3.3020 ms]

libblur gaussian blur: sigma 50.0
                        time:   [25.141 ms 25.244 ms 25.373 ms]

libblur fast_blur exact alternative: sigma 3.0
                        time:   [2.9289 ms 2.9360 ms 2.9436 ms]

libblur fast_blur exact alternative: sigma 7.0
                        time:   [3.1053 ms 3.1424 ms 3.1801 ms]

libblur fast_blur exact alternative: sigma 50.0
                        time:   [3.1044 ms 3.1450 ms 3.1875 ms]

awxkee added 6 commits June 18, 2025 16:53
# Conflicts:
#	src/imageops/sample.rs
#	src/images/dynimage.rs
# Conflicts:
#	benches/blur.rs
#	src/imageops/sample.rs
#	src/images/dynimage.rs
@Shnatsel
Copy link
Member

Thank you!

I wonder, are the benchmark numbers for libblur from single-threaded or multi-threaded execution?

@awxkee
Copy link
Contributor Author

awxkee commented Jun 18, 2025

Single threaded as far as I can tell.

@@ -854,8 +855,8 @@ impl DynamicImage {
/// This method typically assumes that the input is scene-linear light.
/// If it is not, color distortion may occur.
#[must_use]
pub fn blur(&self, sigma: f32) -> DynamicImage {
dynamic_map!(*self, ref p => imageops::blur(p, sigma))
pub fn blur(&self, kernel_size: usize, sigma: f32) -> DynamicImage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to make a decision on whether we want to do this API change. If so, this PR cannot be merged until we start working on the next major release.

Do other library generally require that you specify the kernel size alongside the blur amount?

Copy link
Contributor Author

@awxkee awxkee Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure analytical gaussian filter often assumes that you want full control on it. libblur and OpenCV require kernel size and also support assymetry.

See here.

I'm ok to remove kernel size, and compute correct kernel size from sigma if you think it's better fit. But I'm not ok to return old behaviour, what is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it needs to wait something it's better to me to go ahead and close this PR. I don't have any plans to support rotting PRs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easy-to-use wrapper that only requires the user to specify the sigma and computes the kernel size by itself would be nice. So it'd be two functions, e.g. blur() and blur_advanced().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already preparing a major version bump on the main branch so I'd do the change now. However, another idea is given by your question regarding kernel sizes. Small kernel sizes are faster (just see our cutoff point for ring queue in this). Considering most of our users we should at least suggest the common choices.

What if we were to introduce a 'BlurKernel' type that wraps those choices and gives a few constructors / static constants of common cases? Then indeed blur_{by,with}(BlurKernel) may make sense with blur(f32) yielding a somewhat common default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually confused by the sigma vs kernel size distinction.

I just want a high-level API that I can call that roughly matches what I'd get from "blur radius" in GIMP, but I also recognize that there are use cases for more direct control of the parameters. So I'd prefer an easy-to-use version as blur() and a more advanced API for people who need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did in libblur such parameters builder. However, it might still be overkill, but I don't expect that someone plugging in 70K lanes of SIMD code wouldn't at least do a bit of investigation into how to use the API.

For more general purpose implementation I agree that GIMP style "blur radius" is preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of a single argument that behaves like blur radius. That's what I've always assumed that sigma did

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we were to introduce a 'BlurKernel' type that wraps those choices and gives a few constructors / static constants of common cases? Then indeed blur_{by,with}(BlurKernel) may make sense with blur(f32) yielding a somewhat common default.

I think providing something like image.blur(3) is enough for general-purpose use.
Adding methods like image.blur3, image.blur5, image.blur7 gives a strong impression of API bloat.
It might make some sense if we were doing something truly interesting in these implementations, just to mark them that they may yield results are different from what you'd expect. But for now all implementations are the same, even when you hit a ring queue path it gives you the same result, but using a different way to get it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument radius/sigma makes perfect sense to me, too. So agreed, the simple function might as well accept a simple integer argument. I'm still not entirely sure we should remove the interaction entirely though for an advanced function.. Deriving sigma from the radius doesn't make much sense either. For small values (3, 5, 7) the influence of normalization is quite high, which people will have different preferences for.

But also importantly, imagmagick has it as an option(https://imagemagick.org/script/command-line-options.php#blur). I'm thinking if we had a struct and dispatched internally:

struct BlurKernel {
    size: (u32, u32),
    sigma: (f32, f32),
}

impl BlurKernel {
    // Corresponds to calling the simple function with (3).
    pub const THREE: … // Due to float-const this must be filled manually..

    /// The isotropic case.
    pub fn from_radius(sz: u32) -> Self { … }

    /// Document the (1.0, 1.0) default and what anisotropy refers to.
    pub fn with_sigma(self, (x, y): (f32, 32)) -> Self { … }
}

I think that is straightforward enough, but please do argue if you think this is API bloat. The anisotropy case in particular, I think there's enough reason to support it if it is one additional parameter to an intermediate/expert-level function call.

Copy link
Member

@197g 197g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want fixed point implementations for u8 or u16 instead of f32? Or just pure f32 convolution is needed?

You matched the style in as discussed in the codec case and to me that is neat enough.

But also this seems to be an implementation choice, rather than an API choice. At least when we cast back to the underlying storage type I'd expect that what you're referring to as fixed point is actually exact with regards to rounding ("as-if floating point"). Then it should not matter to the user and the choice should be as fast as possible. However, that is then also a discussion point for the future instead as we can dispatch on those specific types (I::Pixel: 'static allows TypeId).

If special cases is wanted then do you expect them to be bit exact?

Not sure. It doesn't seem necessary but appealing for the special casing of types / fixed-point. Less so for special casing for kernel sizes but also see discussion on those below.


let mut start_ky = column_kernel_len / 2 + 1;

start_ky %= column_kernel_len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is odd. It just catches the case of column_kernel_len == 1 but that rather seems like a very special case on its own which doesn't need column buffers at all to be honest.

Copy link
Contributor Author

@awxkee awxkee Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the implementation assumes that anisotropy is an acceptable condition, and there is no special implementation case that handles column as identity and row as convolution. So implementation assumes that column_kernel_len == 1 and row_kernel_len == 5 is just fine.

I technically could drop any possibility of anisotropy if you see a better fit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just confused by the way those lines are written. It suggests that column_kernel_len == 1 is an even more special case than anisotropy itself since no other case required the modulos operation and the length of 1 does not really require any intermediate buffers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, a comment is a fine resolution to this to me. There are bigger fish to fry.

Comment on lines 506 to 511
if scanned_row_kernel.is_empty() || scanned_column_kernel.is_empty() {
for (dst, src) in destination.iter_mut().zip(image.iter()) {
*dst = *src;
}
return Ok(());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty kernel is an error for convolution, the no-op case is a [1.0] kernel. So this is a leniency contract, right? Should be documented in the function signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a mistake here — this is a no-op only when both kernels are [1.0]. Currently, the implementation assumes that anisotropy is an acceptable condition.

@@ -854,8 +855,8 @@ impl DynamicImage {
/// This method typically assumes that the input is scene-linear light.
/// If it is not, color distortion may occur.
#[must_use]
pub fn blur(&self, sigma: f32) -> DynamicImage {
dynamic_map!(*self, ref p => imageops::blur(p, sigma))
pub fn blur(&self, kernel_size: usize, sigma: f32) -> DynamicImage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already preparing a major version bump on the main branch so I'd do the change now. However, another idea is given by your question regarding kernel sizes. Small kernel sizes are faster (just see our cutoff point for ring queue in this). Considering most of our users we should at least suggest the common choices.

What if we were to introduce a 'BlurKernel' type that wraps those choices and gives a few constructors / static constants of common cases? Then indeed blur_{by,with}(BlurKernel) may make sense with blur(f32) yielding a somewhat common default.

@awxkee awxkee requested a review from 197g June 19, 2025 19:44
@awxkee awxkee force-pushed the gauss_b branch 3 times, most recently from 6f1ac04 to 6dc369e Compare June 19, 2025 21:05
@awxkee
Copy link
Contributor Author

awxkee commented Jun 19, 2025

A ton of WebP tests started failing today. I'm not sure if this PR is related to that.

@Shnatsel
Copy link
Member

That should be the fixes shipped in https://crates.io/crates/image-webp v0.2.3 altering the enshrined hashes.

In the long run something like image-webp's pixel difference threshold should be implemented, or image-rs/image-webp#146 should be fixed at which point we could enshrine hashes again.

But for now they should probably just be regenerated. I'll open a PR with that against the main branch.

@awxkee awxkee force-pushed the gauss_b branch 3 times, most recently from 338e71a to b564431 Compare June 20, 2025 07:34
@awxkee
Copy link
Contributor Author

awxkee commented Jun 20, 2025

I realized that I benchmarked the GenericImageView implementation instead of DynamicImage, so here are the updated numbers.

Benchmark

fast blur: sigma 3.0    time:   [24.121 ms 24.321 ms 24.525 ms]
                        change: [+4.5411% +5.6756% +6.8759%] (p = 0.00 < 0.05)
                        Performance has regressed.

fast blur: sigma 7.0    time:   [23.648 ms 23.834 ms 24.030 ms]
                        change: [-12.527% -5.4979% -0.7447%] (p = 0.07 > 0.05)
                        No change in performance detected.

fast blur: sigma 50.0   time:   [25.422 ms 26.036 ms 26.734 ms]
                        change: [+6.1714% +8.8062% +12.002%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) high mild
  8 (8.00%) high severe

gaussian blur: sigma 3.0
                        time:   [9.1108 ms 9.1780 ms 9.2486 ms]
                        change: [-2.8948% -1.7866% -0.6687%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

gaussian blur: sigma 7.0
                        time:   [21.107 ms 21.351 ms 21.645 ms]
                        change: [-2.4164% -1.0423% +0.5379%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  13 (13.00%) high mild
  1 (1.00%) high severe

Benchmarking gaussian blur: sigma 50.0: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17.7s, or reduce sample count to 20.
gaussian blur: sigma 50.0
                        time:   [172.45 ms 173.46 ms 174.53 ms]
                        change: [-3.2896% -2.4175% -1.5235%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

gaussian blur (dynamic image): sigma 3.0
                        time:   [5.2648 ms 5.3129 ms 5.3641 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

gaussian blur (dynamic image): sigma 7.0
                        time:   [12.699 ms 12.860 ms 13.099 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

Benchmarking gaussian blur (dynamic image): sigma 50.0: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.7s, or reduce sample count to 50.
gaussian blur (dynamic image): sigma 50.0
                        time:   [99.216 ms 100.11 ms 101.01 ms]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blur function is too slow
4 participants