-
Notifications
You must be signed in to change notification settings - Fork 967
Add MultipartDecodingMode to configure filename decoding in multipart request #6465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… requests Motivation: UTF-8 is the de facto standard for encoding the filename parameter in multipart/form-data, but Armeria uses ISO-8859-1. Also, other clients might use percent-encoding. https://github.com/helidon-io/helidon/blob/7dce029dcbe0cdda36b1b84eb24ba1eb9f9da2eb/http/http/src/main/java/io/helidon/http/ContentDisposition.java#L250-L256 Modifications: - Introduced the `MultipartDecodingMode`` enum with three distinct strategies: UTF_8, ISO_8859_1, and URL_DECODING. - Added `defaultMultipartDecodingMode` `Flags``, which determines the default strategy by reading the `com.linecorp.armeria.defaultMultipartDecodingMode`` JVM system property. - Additionally, the annotated service for multipart file uploads has been updated to use a UUID-based filename on the server side. This is a defensive measure to: - Prevent potential filename corruption. - Avoid issues where a long filename might exceed operating system path length limits. Result: - Server administrators can now explicitly configure the decoding strategy. - [Breaking Change] The default decoding mode is now explicitly UTF-8 to align with the de facto standard of modern web clients. If you want to use the previous behaviour, you can restore it by setting the following JVM system property: `-Dcom.linecorp.armeria.defaultMultipartDecodingMode=ISO_8859_1`.
|
Thank you for fixing this |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6465 +/- ##
============================================
- Coverage 74.46% 74.09% -0.37%
- Complexity 22234 23036 +802
============================================
Files 1963 2064 +101
Lines 82437 86299 +3862
Branches 10764 11335 +571
============================================
+ Hits 61385 63942 +2557
- Misses 15918 16942 +1024
- Partials 5134 5415 +281 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| } | ||
| } | ||
|
|
||
| private static String replaceAndDecodeFilename(String contentDisposition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) What do you think of applying url decoding at ContentDisposition#parse instead after filename parsing is already complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer respecting the Java default encoding and support URL_DECODING in ContentDisposition#parse.
- return new String(ByteBufUtil.getBytes(byteBuf), HEADER_ENCODING);
// The default charset will be utf-8 by default in modern Java versions.
+ return new String(ByteBufUtil.getBytes(byteBuf));There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of applying url decoding at ContentDisposition#parse instead after filename parsing is already complete?
I had combined the URL decoding option with the other charset options, so I wanted to handle it in the same place. But I agree it's better to handle them separately. 😉
I prefer respecting the Java default encoding
Relying on the JVM's default charset is risky because it's platform-dependent, which can lead to the string being decoded differently across various server environments.
I've added this option for supporting this case which always use UTF-8:
https://github.com/helidon-io/helidon/blob/31bc817fe044308f7ef42e2ed55bd10c0eb0646c/http/http/src/main/java/io/helidon/http/ContentDisposition.java#L372
If we need to support other charsets in the future, we can extend this by adding another option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an arbitrary value but the JVM default, so I don't think it is too risky but the current implementation looks fine to me.
| * JVM option to override the default value. | ||
| */ | ||
| @UnstableApi | ||
| public static MultipartDecodingMode defaultMultipartDecodingMode() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional) MultipartDecodingMode sounds like the multipart body is decoded.
| public static MultipartDecodingMode defaultMultipartDecodingMode() { | |
| public static MultipartFilenameDecodingMode multipartFilenameDecodingMode() { |
| try { | ||
| Files.createDirectories(directory); | ||
| return Files.createTempFile(directory, null, '-' + filename); | ||
| return Files.createFile(directory.resolve(UUID.randomUUID() + ".multipart")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional) Given that users may want to identify files by their name when debugging, it may make more sense to truncate the filename instead
| return Files.createFile(directory.resolve(UUID.randomUUID() + ".multipart")); | |
| int MAGIC_NUMBER = 10; | |
| return Files.createTempFile(directory, null, '-' + filename.substring(0, min(sz, MAGIC_NUMBER))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to keep the file extension if it exists although we truncate the file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it and revert the previous logic. 😉
| } | ||
| } | ||
|
|
||
| private static String replaceAndDecodeFilename(String contentDisposition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer respecting the Java default encoding and support URL_DECODING in ContentDisposition#parse.
- return new String(ByteBufUtil.getBytes(byteBuf), HEADER_ENCODING);
// The default charset will be utf-8 by default in modern Java versions.
+ return new String(ByteBufUtil.getBytes(byteBuf));| /** | ||
| * URL-decodes the filename using the UTF-8 charset. | ||
| */ | ||
| URL_DECODING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, what do you think of update ContentDisposition with the upstream one?
They added functionality to support base64 encoding in filenames.
spring-projects/spring-framework#26463
spring-projects/spring-framework#28236
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link. Updated.
| try { | ||
| Files.createDirectories(directory); | ||
| return Files.createTempFile(directory, null, '-' + filename); | ||
| return Files.createFile(directory.resolve(UUID.randomUUID() + ".multipart")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to keep the file extension if it exists although we truncate the file name.
bdc1f42 to
9d54ca4
Compare
ikhoon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍👍
Motivation:
UTF-8 is the de facto standard for encoding the filename parameter in multipart/form-data, but Armeria uses ISO-8859-1. Also, other clients might use percent-encoding.
https://github.com/helidon-io/helidon/blob/7dce029dcbe0cdda36b1b84eb24ba1eb9f9da2eb/http/http/src/main/java/io/helidon/http/ContentDisposition.java#L250-L256
Modifications:
MultipartDecodingModeenum with three distinct strategies: UTF_8, ISO_8859_1, and URL_DECODING.defaultMultipartDecodingModeFlags, which determines the default strategy by reading thecom.linecorp.armeria.defaultMultipartDecodingModeJVM system property.Result:
-Dcom.linecorp.armeria.defaultMultipartDecodingMode=ISO_8859_1.