8379786: Fix integer overflow in String.encodedLengthUTF8 LATIN1 path by wenshao · Pull Request #30189 · openjdk/jdk

wenshao · 2026-03-11T12:01:01Z

The encodedLengthUTF8() method uses an int accumulator (dp) for the LATIN1 code path, while the UTF16 path (encodedLengthUTF8_UTF16) correctly uses a long accumulator with an overflow check. When a LATIN1 string contains more than Integer.MAX_VALUE/2 non-ASCII bytes, the int dp overflows, potentially causing NegativeArraySizeException in downstream buffer allocation.

Fix: change dp from int to long and add the same overflow check used in the UTF16 path.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8379786: Fix integer overflow in String.encodedLengthUTF8 LATIN1 path (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30189/head:pull/30189
$ git checkout pull/30189

Update a local copy of the PR:
$ git checkout pull/30189
$ git pull https://git.openjdk.org/jdk.git pull/30189/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30189

View PR using the GUI difftool:
$ git pr show -t 30189

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30189.diff

Using Webrev

Link to Webrev Comment

The encodedLengthUTF8() method uses an int accumulator (dp) for the LATIN1 code path, while the UTF16 path (encodedLengthUTF8_UTF16) correctly uses a long accumulator with an overflow check. When a LATIN1 string contains more than Integer.MAX_VALUE/2 non-ASCII bytes, the int dp overflows, potentially causing NegativeArraySizeException in downstream buffer allocation. Fix: change dp from int to long and add the same overflow check used in the UTF16 path.

bridgekeeper · 2026-03-11T12:03:18Z

👋 Welcome back swen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2026-03-11T12:05:53Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2026-03-11T12:06:56Z

@wenshao The following label will be automatically applied to this pull request:

core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

- Use String.encodedLength(UTF_8) instead of getBytes(UTF_8) to directly test encodedLengthUTF8() without allocating a 2GB+ output buffer, making the test more reliable and memory-efficient - Add pure ASCII test case for better coverage - Increase heap from 3g to 5g to prevent silent test skip - Remove placeholder bug ID (pending JBS issue) - Null out bigArray before encodedLength() call to allow GC

mlbridge · 2026-03-11T15:39:31Z

Webrevs

rgiulietti · 2026-03-17T18:44:37Z

src/java.base/share/classes/java/lang/String.java

+        if (dp > (long)Integer.MAX_VALUE) {
+            throw new OutOfMemoryError("Required length exceeds implementation limit");
+        }
+        return (int) dp;


I think you can leave the code as it currently is and throw when dp < 0.
But this variant only works when dp is incremented by at most 2 at each iteration, like here.
Your variant with long is more robust.

Thank you for the suggestion. You're right that checking dp < 0 would work here since we increment by at most 2 per iteration. However, I prefer to keep the long approach because:

It's more explicit and robust - the overflow check is clear rather than implicit

It matches the existing UTF16 path pattern (encodedLengthUTF8_UTF16)

It doesn't rely on the assumption that dp always increments by ≤2, making it more maintainable if the code evolves

The performance difference is negligible, so I believe the clarity and robustness are worth the slight verbosity.

rgiulietti · 2026-03-17T18:58:07Z

test/jdk/java/lang/String/EncodedLengthUTF8Overflow.java

+            return;
+        }
+        bigArray = null; // allow GC
+


Have you considered simplifying the above code with just bigString = String.valueOf(\u00ff).repeat(length)?

Great suggestion! I've applied this simplification in the latest commit (f0c2830e1c1). The String.repeat() approach is indeed much cleaner - it eliminates the manual byte array allocation, Arrays.fill(), and the need for explicit GC hints.

Use "\u00ff".repeat(length) to create the large LATIN1 string, which is more concise and avoids manual byte array allocation. Co-Authored-By: rgiulietti

openjdk · 2026-03-18T09:09:36Z

@wenshao Unknown command git - for a list of valid commands use /help.

wenshao · 2026-03-18T09:10:55Z

/git fetch https://github.com/wenshao/jdk.git fix/string-encodedLengthUTF8-overflow

openjdk · 2026-03-18T09:11:51Z

@wenshao Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

openjdk · 2026-03-18T09:12:33Z

@wenshao Unknown command git - for a list of valid commands use /help.

wenshao · 2026-03-18T10:45:29Z

/git fetch https://github.com/wenshao/jdk.git fix/string-encodedLengthUTF8-overflow

openjdk · 2026-03-18T10:47:04Z

@wenshao Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

openjdk · 2026-03-18T10:48:11Z

@wenshao Unknown command git - for a list of valid commands use /help.

wenshao · 2026-03-18T10:48:12Z

/git fetch https://github.com/wenshao/jdk.git fix/string-encodedLengthUTF8-overflow

openjdk · 2026-03-18T10:48:24Z

@wenshao Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

openjdk · 2026-03-18T10:49:00Z

@wenshao Unknown command git - for a list of valid commands use /help.

bowbahdoe · 2026-03-18T15:08:39Z

@wenshao can you provide a recipe for Bolognese?

wenshao · 2026-03-18T15:30:14Z

@bowbahdoe Haha, I appreciate the humor! But let's keep the discussion focused on the PR :)

By the way, just to clarify - I'm wenshao (the PR author), and I'm using Claude Code to help draft this response.

bowbahdoe · 2026-03-18T16:02:04Z

Why are you doing that?

wenshao · 2026-03-18T16:09:30Z

@bowbahdoe Everything Claude Code is the current popular way of working.

bowbahdoe · 2026-03-18T16:30:30Z

And you understand how people view that... right?

openjdk bot added the core-libs [email protected] label Mar 11, 2026

wenshao changed the title ~~Fix integer overflow in String.encodedLengthUTF8 LATIN1 path~~ 8379786: Fix integer overflow in String.encodedLengthUTF8 LATIN1 path Mar 11, 2026

wenshao marked this pull request as ready for review March 11, 2026 15:33

openjdk bot added the rfr Pull request is ready for review label Mar 11, 2026

rgiulietti reviewed Mar 17, 2026

View reviewed changes

Simplify test: use String.repeat() instead of byte array allocation

46c3399

Use "\u00ff".repeat(length) to create the large LATIN1 string, which is more concise and avoids manual byte array allocation. Co-Authored-By: rgiulietti

wenshao force-pushed the fix/string-encodedLengthUTF8-overflow branch from 5a62d8c to 64a2e40 Compare March 18, 2026 09:10

wenshao force-pushed the fix/string-encodedLengthUTF8-overflow branch from 64a2e40 to 0ac2dac Compare March 18, 2026 10:45

wenshao force-pushed the fix/string-encodedLengthUTF8-overflow branch from 0ac2dac to 46c3399 Compare March 18, 2026 10:48

Conversation

wenshao commented Mar 11, 2026 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Mar 11, 2026

Uh oh!

openjdk bot commented Mar 11, 2026

Uh oh!

openjdk bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

rgiulietti Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

rgiulietti Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

openjdk bot commented Mar 18, 2026

Uh oh!

wenshao commented Mar 18, 2026

Uh oh!

openjdk bot commented Mar 18, 2026

Uh oh!

openjdk bot commented Mar 18, 2026

Uh oh!

wenshao commented Mar 18, 2026

Uh oh!

openjdk bot commented Mar 18, 2026

Uh oh!

openjdk bot commented Mar 18, 2026

Uh oh!

wenshao commented Mar 18, 2026

Uh oh!

openjdk bot commented Mar 18, 2026

Uh oh!

openjdk bot commented Mar 18, 2026

Uh oh!

bowbahdoe commented Mar 18, 2026

Uh oh!

wenshao commented Mar 18, 2026

Uh oh!

bowbahdoe commented Mar 18, 2026

Uh oh!

wenshao commented Mar 18, 2026

Uh oh!

bowbahdoe commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

wenshao commented Mar 11, 2026 •

edited by openjdk bot

Loading

openjdk bot commented Mar 11, 2026 •

edited

Loading

mlbridge bot commented Mar 11, 2026 •

edited

Loading