-
-
Notifications
You must be signed in to change notification settings - Fork 8.6k
16612 download large files #16627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
16612 download large files #16627
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||||
f42d85e to
2c190cc
Compare
|
Please ping when this is ready for review. |
Sure. I think that the PR is already finished, but there are some tests failing. |
93b61f3 to
1a93cbc
Compare
I added a new Grid endpoint "/se/files/:name" which allows downloading the file directly, without encoding it to Base64 and adding to Json. This transformation kills the performance and causes OutOfMemory errors for large files (e.g. 256+ MB). NB! Be sure that `toString()` method of objects (HttpRequest, HttpResponse, Contents.Supplier) never returns too long string - it spam debug logs and can cause OOM during debugging.
…ier` to separate classes It makes debugging easier. You can easily see what instances they are and where they come from.
Instead of reading the whole file to a byte array, just save given InputStream directly to the file. Now it can download large files (I tried 4GB) while consuming very low memory.
… downloading files. For json responses, still return `Contents.bytes` which allows re-reading its content multiple times. Just in case.
… deleted After stopping a Grid node, the folder is deleted asynchronously (by cache removal listener). So we need to wait for it in test.
…opped At least on my machine, stopping the node takes some time, and any checks right after `node.stop(sessionId)` often can fail.
Gr... This is extremely hard to debug test. After hours of debugging, I came to a conclusion that we just need to increase the timeout. On my machine, `latch` gets decreased after ~1.2 seconds. So 1 second was not enough.
I don't know why, but sometimes we receive `HttpTimeoutException` instead of `InterruptedException`. Seems reasonable to consider execution as interrupted in both cases. (?)
None of `is.readNBytes` implementations returns -1. It always returns 0 or positive number.
1a93cbc to
9faf649
Compare
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||
@diemol Ping. This PR is ready for review, as well as few others. |
User description
🔗 Related Issues
Fixes #16612
💥 What does this PR do?
This PR allows downloading very large files (I tried up to 4GB) from Selenium Grid, while consuming very low memory.
🔄 Types of changes
PR Type
Enhancement, Performance
Description
Add new Grid endpoint
/se/files/:namefor direct file downloads without Base64 encodingOptimize
RemoteWebDriver.downloadFile()to stream files directly instead of loading into memoryExtract anonymous
Contents.Supplierimplementations into separate named classes for better debuggingImprove HTTP response handling to support binary streams and large file transfers
Fix flaky tests by adding proper wait conditions and increased timeouts
Diagram Walkthrough
File Walkthrough
20 files
Add new GET endpoint for direct file downloadsImplement direct file streaming and URL parsing logicOptimize file download to stream directly to diskAdd file and stream content suppliers, deprecate memory-intensivemethodsNew supplier for streaming file content without loading to memoryNew supplier for binary stream content with length trackingExtract bytes supplier to separate class for clarityNew supplier for file-backed output stream contentSwitch to InputStream response handler for large file supportHandle binary streams separately from JSON responsesAdd binary stream detection and avoid loading large files to memoryAdd new GET_DOWNLOADED_FILE command mappingAdd GET_DOWNLOADED_FILE command constantAdd toString method to prevent debug log spamAdd toString method to prevent debug log spamAdd URL decoding utility methodUse atomic long for thread-safe length trackingAdd toString and contentAsString methodsImprove toString to include content informationImprove toString to avoid loading large content5 files
Add wait conditions for async directory deletionIncrease timeout to fix flaky testAdd wait conditions and test for filename extractionHandle HttpTimeoutException in interrupt detectionRemove debug logging statement3 files
Add jspecify dependencyAdd guava dependency for media type handlingAdd test logs upload on failure2 files