Skip to content

Fix UB in JpegCompressionDistortion: use data() for past-the-end pointer#6251

Open
JanuszL wants to merge 3 commits intoNVIDIA:mainfrom
JanuszL:oob_JpegCompressionDistortionCPU
Open

Fix UB in JpegCompressionDistortion: use data() for past-the-end pointer#6251
JanuszL wants to merge 3 commits intoNVIDIA:mainfrom
JanuszL:oob_JpegCompressionDistortionCPU

Conversation

@JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Mar 10, 2026

  • Replace &shape[ndim] and &shape[0] with shape.data() + ndim and
    shape.data() in JpegCompressionDistortionCPU::RunImpl.
  • &shape[ndim] calls operator[] with an out-of-bounds index before taking
    the address, which is undefined behavior and triggers assertion failures in
    debug builds. Using shape.data() + ndim computes the past-the-end pointer
    via pointer arithmetic, which is explicitly permitted by the C++ standard
    ([expr.add]) and is the correct way to form a one-past-the-end pointer
    without dereferencing it.

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

  • Replace &shape[ndim] and &shape[0] with shape.data() + ndim and
    shape.data() in JpegCompressionDistortionCPU::RunImpl.
  • &shape[ndim] calls operator[] with an out-of-bounds index before taking
    the address, which is undefined behavior and triggers assertion failures in
    debug builds. Using shape.data() + ndim computes the past-the-end pointer
    via pointer arithmetic, which is explicitly permitted by the C++ standard
    ([expr.add]) and is the correct way to form a one-past-the-end pointer
    without dereferencing it.

Additional information:

Affected modules and functionalities:

  • JPEG CPU distortion

Key points relevant for the review:

  • NA

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

…ter.

- Replace `&shape[ndim]` and `&shape[0]` with `shape.data() + ndim` and
  `shape.data()` in JpegCompressionDistortionCPU::RunImpl.
- `&shape[ndim]` calls operator[] with an out-of-bounds index before taking
   the address, which is undefined behavior and triggers assertion failures in
   debug builds. Using `shape.data() + ndim` computes the past-the-end pointer
   via pointer arithmetic, which is explicitly permitted by the C++ standard
   ([expr.add]) and is the correct way to form a one-past-the-end pointer
   without dereferencing it.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented Mar 10, 2026

!build

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR fixes undefined behavior in JpegCompressionDistortionCPU::RunImpl where &shape[ndim] was used to form a past-the-end pointer by calling operator[] with an out-of-bounds index (ndim == shape.size()). The fix replaces the raw address-of-subscript idiom with iterator arithmetic (shape.begin() + offset), which is well-defined by the C++ standard and idiomatic for span/container types.

  • volume(&shape[0], &shape[f_dim + 1])volume(shape.begin(), shape.begin() + f_dim + 1)
  • volume(&shape[f_dim + 1], &shape[ndim])volume(shape.begin() + f_dim + 1, shape.begin() + ndim)

The comment explaining the empty-range behavior when f_dim == -1 is preserved (slightly reformatted). The fix is minimal, correct, and addresses a real debug-build assertion failure.

Confidence Score: 5/5

  • This PR is safe to merge — it is a minimal, well-understood UB fix with no logic or behavioral changes.
  • The change is a one-file, two-line fix replacing undefined behavior (&shape[ndim] via out-of-bounds operator[]) with valid iterator arithmetic (shape.begin() + ndim). The semantics are identical, the fix is standard C++ practice, and the edge case (f_dim == -1) is handled correctly as before.
  • No files require special attention.

Important Files Changed

Filename Overview
dali/operators/image/distortion/jpeg_compression_distortion_op_cpu.cc Replaces out-of-bounds operator[] usage with iterator arithmetic (shape.begin() + offset) to eliminate undefined behavior when forming a past-the-end pointer.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[RunImpl called] --> B[Get tensor shape span]
    B --> C{f_dim == -1?}
    C -- Yes --> D["shape.begin() + f_dim+1 == shape.begin()\n→ empty range → nframes = 1"]
    C -- No --> E["shape.begin() + f_dim+1\n→ valid iterator → nframes = F dimension volume"]
    D --> F["frame_size = volume(shape.begin() + f_dim+1, shape.begin() + ndim)"]
    E --> F
    F --> G[Dispatch work per frame to thread pool]
    G --> H[RunJpegDistortionCPU per frame]
Loading

Last reviewed commit: 2c168f5

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45811839]: BUILD STARTED

@JanuszL JanuszL added the important-fix Fixes an important issue in the software or development environment. label Mar 10, 2026
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45811839]: BUILD PASSED

Comment on lines +99 to +101
volume(shape.data(), shape.data() + f_dim + 1); // note that if f_dim is -1, this
// evaluates to an empty range, volume of 1
int64_t frame_size = volume(shape.data() + f_dim + 1, shape.data() + ndim);
Copy link
Contributor

@mzient mzient Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that formally it's better to do:

Suggested change
volume(shape.data(), shape.data() + f_dim + 1); // note that if f_dim is -1, this
// evaluates to an empty range, volume of 1
int64_t frame_size = volume(shape.data() + f_dim + 1, shape.data() + ndim);
volume(shape.begin(), shape.begin() + f_dim + 1); // note that if f_dim is -1, this
// evaluates to an empty range, volume of 1
int64_t frame_size = volume(shape.begin() + f_dim + 1, shape.begin() + ndim);

In case of some super-debug builds, advancing an iterator may be checked, whereas a raw pointer (as returned by data()) won't be.

Also, note that the out of bound will never occur in practice, as the only accepted layouts are HWC and FHWC (see line 36), so this is not an "important fix" in that it doesn't create an exploitable attack surface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. You are right regarding severity.

@mzient mzient self-assigned this Mar 11, 2026
JanuszL added 2 commits March 11, 2026 11:16
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented Mar 11, 2026

!build

@JanuszL JanuszL removed the important-fix Fixes an important issue in the software or development environment. label Mar 11, 2026
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45872745]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45872745]: BUILD FAILED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants