Skip to content

8379786: Fix integer overflow in String.encodedLengthUTF8 LATIN1 path#30189

Open
wenshao wants to merge 3 commits intoopenjdk:masterfrom
wenshao:fix/string-encodedLengthUTF8-overflow
Open

8379786: Fix integer overflow in String.encodedLengthUTF8 LATIN1 path#30189
wenshao wants to merge 3 commits intoopenjdk:masterfrom
wenshao:fix/string-encodedLengthUTF8-overflow

Conversation

@wenshao
Copy link
Contributor

@wenshao wenshao commented Mar 11, 2026

The encodedLengthUTF8() method uses an int accumulator (dp) for the LATIN1 code path, while the UTF16 path (encodedLengthUTF8_UTF16) correctly uses a long accumulator with an overflow check. When a LATIN1 string contains more than Integer.MAX_VALUE/2 non-ASCII bytes, the int dp overflows, potentially causing NegativeArraySizeException in downstream buffer allocation.

Fix: change dp from int to long and add the same overflow check used in the UTF16 path.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8379786: Fix integer overflow in String.encodedLengthUTF8 LATIN1 path (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30189/head:pull/30189
$ git checkout pull/30189

Update a local copy of the PR:
$ git checkout pull/30189
$ git pull https://git.openjdk.org/jdk.git pull/30189/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30189

View PR using the GUI difftool:
$ git pr show -t 30189

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30189.diff

Using Webrev

Link to Webrev Comment

The encodedLengthUTF8() method uses an int accumulator (dp) for the
LATIN1 code path, while the UTF16 path (encodedLengthUTF8_UTF16)
correctly uses a long accumulator with an overflow check. When a
LATIN1 string contains more than Integer.MAX_VALUE/2 non-ASCII bytes,
the int dp overflows, potentially causing NegativeArraySizeException
in downstream buffer allocation.

Fix: change dp from int to long and add the same overflow check used
in the UTF16 path.
@bridgekeeper
Copy link

bridgekeeper bot commented Mar 11, 2026

👋 Welcome back swen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 11, 2026

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Mar 11, 2026

@wenshao The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

- Use String.encodedLength(UTF_8) instead of getBytes(UTF_8) to
  directly test encodedLengthUTF8() without allocating a 2GB+
  output buffer, making the test more reliable and memory-efficient
- Add pure ASCII test case for better coverage
- Increase heap from 3g to 5g to prevent silent test skip
- Remove placeholder bug ID (pending JBS issue)
- Null out bigArray before encodedLength() call to allow GC
@wenshao wenshao changed the title Fix integer overflow in String.encodedLengthUTF8 LATIN1 path 8379786: Fix integer overflow in String.encodedLengthUTF8 LATIN1 path Mar 11, 2026
@wenshao wenshao marked this pull request as ready for review March 11, 2026 15:33
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 11, 2026
@mlbridge
Copy link

mlbridge bot commented Mar 11, 2026

Webrevs

if (dp > (long)Integer.MAX_VALUE) {
throw new OutOfMemoryError("Required length exceeds implementation limit");
}
return (int) dp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can leave the code as it currently is and throw when dp < 0.
But this variant only works when dp is incremented by at most 2 at each iteration, like here.
Your variant with long is more robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. You're right that checking dp < 0 would work here since we increment by at most 2 per iteration. However, I prefer to keep the long approach because:

  1. It's more explicit and robust - the overflow check is clear rather than implicit
  2. It matches the existing UTF16 path pattern (encodedLengthUTF8_UTF16)
  3. It doesn't rely on the assumption that dp always increments by ≤2, making it more maintainable if the code evolves

The performance difference is negligible, so I believe the clarity and robustness are worth the slight verbosity.

return;
}
bigArray = null; // allow GC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered simplifying the above code with just bigString = String.valueOf(\u00ff).repeat(length)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! I've applied this simplification in the latest commit (f0c2830e1c1). The String.repeat() approach is indeed much cleaner - it eliminates the manual byte array allocation, Arrays.fill(), and the need for explicit GC hints.

Use "\u00ff".repeat(length) to create the large LATIN1 string,
which is more concise and avoids manual byte array allocation.

Co-Authored-By: rgiulietti
@openjdk
Copy link

openjdk bot commented Mar 18, 2026

@wenshao Unknown command git - for a list of valid commands use /help.

@wenshao wenshao force-pushed the fix/string-encodedLengthUTF8-overflow branch from 5a62d8c to 64a2e40 Compare March 18, 2026 09:10
@wenshao
Copy link
Contributor Author

wenshao commented Mar 18, 2026

/git fetch https://github.com/wenshao/jdk.git fix/string-encodedLengthUTF8-overflow

@openjdk
Copy link

openjdk bot commented Mar 18, 2026

@wenshao Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@openjdk
Copy link

openjdk bot commented Mar 18, 2026

@wenshao Unknown command git - for a list of valid commands use /help.

@wenshao
Copy link
Contributor Author

wenshao commented Mar 18, 2026

/git fetch https://github.com/wenshao/jdk.git fix/string-encodedLengthUTF8-overflow

@wenshao wenshao force-pushed the fix/string-encodedLengthUTF8-overflow branch from 64a2e40 to 0ac2dac Compare March 18, 2026 10:45
@openjdk
Copy link

openjdk bot commented Mar 18, 2026

@wenshao Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@openjdk
Copy link

openjdk bot commented Mar 18, 2026

@wenshao Unknown command git - for a list of valid commands use /help.

@wenshao
Copy link
Contributor Author

wenshao commented Mar 18, 2026

/git fetch https://github.com/wenshao/jdk.git fix/string-encodedLengthUTF8-overflow

@wenshao wenshao force-pushed the fix/string-encodedLengthUTF8-overflow branch from 0ac2dac to 46c3399 Compare March 18, 2026 10:48
@openjdk
Copy link

openjdk bot commented Mar 18, 2026

@wenshao Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@openjdk
Copy link

openjdk bot commented Mar 18, 2026

@wenshao Unknown command git - for a list of valid commands use /help.

@bowbahdoe
Copy link
Contributor

@wenshao can you provide a recipe for Bolognese?

@wenshao
Copy link
Contributor Author

wenshao commented Mar 18, 2026

@bowbahdoe Haha, I appreciate the humor! But let's keep the discussion focused on the PR :)

By the way, just to clarify - I'm wenshao (the PR author), and I'm using Claude Code to help draft this response.

@bowbahdoe
Copy link
Contributor

Why are you doing that?

@wenshao
Copy link
Contributor Author

wenshao commented Mar 18, 2026

@bowbahdoe Everything Claude Code is the current popular way of working.

@bowbahdoe
Copy link
Contributor

And you understand how people view that... right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

3 participants