Skip to content

Conversation

@minwoox
Copy link
Contributor

@minwoox minwoox commented Oct 28, 2025

Motivation:
UTF-8 is the de facto standard for encoding the filename parameter in multipart/form-data, but Armeria uses ISO-8859-1. Also, other clients might use percent-encoding.
https://github.com/helidon-io/helidon/blob/7dce029dcbe0cdda36b1b84eb24ba1eb9f9da2eb/http/http/src/main/java/io/helidon/http/ContentDisposition.java#L250-L256

Modifications:

  • Introduced the MultipartDecodingMode enum with three distinct strategies: UTF_8, ISO_8859_1, and URL_DECODING.
  • Added defaultMultipartDecodingMode Flags, which determines the default strategy by reading the com.linecorp.armeria.defaultMultipartDecodingMode JVM system property.
  • Additionally, the annotated service for multipart file uploads has been updated to use a UUID-based filename on the server side. This is a defensive measure to:
    • Prevent potential filename corruption.
    • Avoid issues where a long filename might exceed operating system path length limits.

Result:

  • Server administrators can now explicitly configure the decoding strategy.
  • [Breaking Change] The default decoding mode is now explicitly UTF-8 to align with the de facto standard of modern web clients. If you want to use the previous behaviour, you can restore it by setting the following JVM system property: -Dcom.linecorp.armeria.defaultMultipartDecodingMode=ISO_8859_1.

… requests

Motivation:
UTF-8 is the de facto standard for encoding the filename parameter in multipart/form-data, but Armeria uses ISO-8859-1.
Also, other clients might use percent-encoding.
https://github.com/helidon-io/helidon/blob/7dce029dcbe0cdda36b1b84eb24ba1eb9f9da2eb/http/http/src/main/java/io/helidon/http/ContentDisposition.java#L250-L256

Modifications:
- Introduced the `MultipartDecodingMode`` enum with three distinct strategies: UTF_8, ISO_8859_1, and URL_DECODING.
- Added `defaultMultipartDecodingMode` `Flags``, which determines the default strategy by reading the `com.linecorp.armeria.defaultMultipartDecodingMode`` JVM system property.
- Additionally, the annotated service for multipart file uploads has been updated to use a UUID-based filename on the server side. This is a defensive measure to:
  - Prevent potential filename corruption.
  - Avoid issues where a long filename might exceed operating system path length limits.
Result:
- Server administrators can now explicitly configure the decoding strategy.
- [Breaking Change] The default decoding mode is now explicitly UTF-8 to align with the de facto standard of modern web clients.
  If you want to use the previous behaviour, you can restore it by setting the following JVM system property: `-Dcom.linecorp.armeria.defaultMultipartDecodingMode=ISO_8859_1`.
@minwoox minwoox added this to the 1.34.0 milestone Oct 28, 2025
@triberraar
Copy link
Contributor

Thank you for fixing this

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

❌ Patch coverage is 85.89744% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.09%. Comparing base (8150425) to head (77f068f).
⚠️ Report is 221 commits behind head on main.

Files with missing lines Patch % Lines
...meria/internal/server/FileAggregatedMultipart.java 25.00% 8 Missing and 1 partial ⚠️
...om/linecorp/armeria/common/ContentDisposition.java 94.21% 4 Missing and 3 partials ⚠️
...rp/armeria/common/SystemPropertyFlagsProvider.java 55.55% 3 Missing and 1 partial ⚠️
.../linecorp/armeria/common/multipart/MimeParser.java 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6465      +/-   ##
============================================
- Coverage     74.46%   74.09%   -0.37%     
- Complexity    22234    23036     +802     
============================================
  Files          1963     2064     +101     
  Lines         82437    86299    +3862     
  Branches      10764    11335     +571     
============================================
+ Hits          61385    63942    +2557     
- Misses        15918    16942    +1024     
- Partials       5134     5415     +281     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

}
}

private static String replaceAndDecodeFilename(String contentDisposition) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) What do you think of applying url decoding at ContentDisposition#parse instead after filename parsing is already complete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer respecting the Java default encoding and support URL_DECODING in ContentDisposition#parse.

- return new String(ByteBufUtil.getBytes(byteBuf), HEADER_ENCODING);
// The default charset will be utf-8 by default in modern Java versions.
+ return new String(ByteBufUtil.getBytes(byteBuf));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of applying url decoding at ContentDisposition#parse instead after filename parsing is already complete?

I had combined the URL decoding option with the other charset options, so I wanted to handle it in the same place. But I agree it's better to handle them separately. 😉

I prefer respecting the Java default encoding

Relying on the JVM's default charset is risky because it's platform-dependent, which can lead to the string being decoded differently across various server environments.

I've added this option for supporting this case which always use UTF-8:
https://github.com/helidon-io/helidon/blob/31bc817fe044308f7ef42e2ed55bd10c0eb0646c/http/http/src/main/java/io/helidon/http/ContentDisposition.java#L372

If we need to support other charsets in the future, we can extend this by adding another option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an arbitrary value but the JVM default, so I don't think it is too risky but the current implementation looks fine to me.

* JVM option to override the default value.
*/
@UnstableApi
public static MultipartDecodingMode defaultMultipartDecodingMode() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional) MultipartDecodingMode sounds like the multipart body is decoded.

Suggested change
public static MultipartDecodingMode defaultMultipartDecodingMode() {
public static MultipartFilenameDecodingMode multipartFilenameDecodingMode() {

try {
Files.createDirectories(directory);
return Files.createTempFile(directory, null, '-' + filename);
return Files.createFile(directory.resolve(UUID.randomUUID() + ".multipart"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional) Given that users may want to identify files by their name when debugging, it may make more sense to truncate the filename instead

Suggested change
return Files.createFile(directory.resolve(UUID.randomUUID() + ".multipart"));
int MAGIC_NUMBER = 10;
return Files.createTempFile(directory, null, '-' + filename.substring(0, min(sz, MAGIC_NUMBER)));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to keep the file extension if it exists although we truncate the file name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it and revert the previous logic. 😉

}
}

private static String replaceAndDecodeFilename(String contentDisposition) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer respecting the Java default encoding and support URL_DECODING in ContentDisposition#parse.

- return new String(ByteBufUtil.getBytes(byteBuf), HEADER_ENCODING);
// The default charset will be utf-8 by default in modern Java versions.
+ return new String(ByteBufUtil.getBytes(byteBuf));

/**
* URL-decodes the filename using the UTF-8 charset.
*/
URL_DECODING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, what do you think of update ContentDisposition with the upstream one?
They added functionality to support base64 encoding in filenames.
spring-projects/spring-framework#26463
spring-projects/spring-framework#28236

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link. Updated.

try {
Files.createDirectories(directory);
return Files.createTempFile(directory, null, '-' + filename);
return Files.createFile(directory.resolve(UUID.randomUUID() + ".multipart"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to keep the file extension if it exists although we truncate the file name.

@minwoox minwoox force-pushed the multipart_decoding branch from bdc1f42 to 9d54ca4 Compare November 5, 2025 13:52
Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants