feat: s3 multipart import file upload #1521

strophy · 2025-07-17T11:06:46Z

This PR introduces support for large file (multipart) imports in AppFlowy-Cloud in order to work around the 5GB limitation in AWS S3 on single PUT operations.

It unifies the logic for small and large file uploads, updating both API and internal handling. The changes ensure that files larger than 5GB are uploaded using a multipart protocol compatible with S3 and the AppFlowy worker, while maintaining backward compatibility for small files.

I have successfully used this to import a 12.2GB Notion export in AWS EC2 with S3 storage and an external proxy. I have a corresponding PR modifying AppFlowy-Web here. Some further changes to proxy template defaults may be needed? I also haven't tested or modified the desktop AppFlowy client because I don't use it, but I guess changes are needed here too to support this function. There shouldn't be any breaking changes though, so the old desktop client should still work.

I'm looking forward to edits and CI results, hopefully my approach wasn't too naive here, my understanding of AppFlowy is still very limited.

Summary by Sourcery

Implement S3-compatible multipart import file upload for files over 5GB and unify the import upload workflow between small and large files

New Features:

Support multipart upload for large import files in client-api with upload_large_import_file
Extend create_import endpoint to choose between presigned URL and multipart upload and return upload_type and workspace_id in the response

Enhancements:

Unify upload_import_file method to delegate to small or large upload logic based on file size

Build:

Add tempfile as a dev-dependency for creating temporary test files

Tests:

Add tests for small single-part and large multipart import file uploads
Update import test helper to supply workspace_id for multipart uploads

sourcery-ai · 2025-07-17T11:06:51Z

Reviewer's Guide

Adds multipart upload support for files over 5GB by unifying small and large file upload logic in the client API and server create_import endpoint, extends the import task response DTO, and updates tests and dependencies accordingly.

Sequence diagram for unified import file upload (small vs large files)

sequenceDiagram
    actor User
    participant Client as Client API
    participant Server as AppFlowy Server
    participant S3 as AWS S3

    User->>Client: upload_import_file(file_path, url, workspace_id)
    Client->>Client: Check file size
    alt file_size <= 5GB
        Client->>Server: create_import (returns presigned_url)
        Client->>S3: PUT file to presigned_url
        S3-->>Client: 200 OK
    else file_size > 5GB
        Client->>Server: create_import (returns upload_type: multipart, workspace_id)
        Client->>Server: create_upload (multipart session)
        loop For each chunk
            Client->>Server: upload_part(chunk)
        end
        Client->>Server: complete_upload
    end

Class diagram for updated CreateImportTaskResponse DTO

classDiagram
    class CreateImportTaskResponse {
        +String task_id
        +Option<String> presigned_url
        +String upload_type
        +Option<String> workspace_id
    }

File-Level Changes

Change	Details	Files
Unify file upload entrypoint to route small vs large uploads	Add S3_SINGLE_PUT_LIMIT constant and file size check Modify upload_import_file signature to accept workspace_id Delegate to upload_small_import_file or upload_large_import_file based on size	`libs/client-api/src/http_file.rs`
Implement multipart upload for large files	Create upload session via create_upload Read file in 100MB chunks and upload with upload_part Collect ETags and complete upload with complete_upload	`libs/client-api/src/http_file.rs`
Enhance create_import_handler to support multipart logic	Branch on content_length >= 5GB to choose upload_type Generate S3 key with workspace import prefix Return upload_type and workspace_id for large uploads Continue returning presigned_url for small uploads	`src/api/data_import.rs`
Extend import task response DTO for multipart	Make presigned_url optional Add upload_type field Add optional workspace_id field	`libs/database-entity/src/dto.rs`
Add tests for small and large file upload workflows	Introduce test_large_file_multipart_upload using NamedTempFile Introduce test_small_file_single_upload with presigned URL Update upload_file helper to supply workspace_id	`tests/workspace/import_test.rs`
Include tempfile crate for file-based tests	Add tempfile to dev-dependencies in root Cargo.toml Add tempfile to dev-dependencies in libs/client-api Cargo.toml	`Cargo.toml` `libs/client-api/Cargo.toml`

Possibly linked issues

License issue #1: The PR adds multipart S3 upload for large files, directly addressing the S3 error in Notion import.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @strophy - I've reviewed your changes - here's some feedback:

Rather than generating a random legacy_task_id in upload_import_file for large uploads, propagate the actual task_id (and workspace_id) returned from create_import so that the client uses the same s3_key path as the server expects.
The multipart upload tests print errors but don’t assert on behavior—add explicit assertions for expected outcomes (e.g., matching upload_type or error variants) so CI can reliably catch regressions.
Consider replacing the string-based upload_type field in CreateImportTaskResponse with a typed enum for better compile-time safety and to avoid magic string inconsistencies.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Rather than generating a random legacy_task_id in upload_import_file for large uploads, propagate the actual task_id (and workspace_id) returned from create_import so that the client uses the same s3_key path as the server expects.
- The multipart upload tests print errors but don’t assert on behavior—add explicit assertions for expected outcomes (e.g., matching upload_type or error variants) so CI can reliably catch regressions.
- Consider replacing the string-based upload_type field in CreateImportTaskResponse with a typed enum for better compile-time safety and to avoid magic string inconsistencies.

## Individual Comments

### Comment 1
<location> `libs/client-api/src/http_file.rs:280` </location>
<code_context>
+    trace!("created multipart upload session: {}", upload_response.upload_id);
+
+    // Step 2: Upload file in chunks
+    const CHUNK_SIZE: usize = 100 * 1024 * 1024; // 100MB chunks
+    let mut file = File::open(file_path).await?;
+    let mut part_number = 1;
+    let mut parts = Vec::new();
</code_context>

<issue_to_address>
Chunk size for multipart upload is hardcoded and may not be optimal for all environments.

Consider making the chunk size configurable or documenting why 100MB was chosen, as different S3 providers have varying part size and number limits.

Suggested implementation:

```rust
    // Step 2: Upload file in chunks
    // Default chunk size is 100MB, which is a common value for S3 multipart uploads.
    // S3 requires parts to be at least 5MB (except the last), and has a maximum of 10,000 parts.
    // Make this configurable to support different environments and S3 providers.
    let chunk_size = chunk_size.unwrap_or(100 * 1024 * 1024); // 100MB default
    let mut file = File::open(file_path).await?;
    let mut part_number = 1;
    let mut parts = Vec::new();

```

```rust
    loop {
      let mut chunk = vec![0u8; chunk_size];
      let bytes_read = file.read(&mut chunk).await?;

```

- You will need to add a `chunk_size: Option<usize>` parameter to the containing function's signature.
- When calling this function, pass `None` to use the default, or `Some(desired_size)` to override.
- If this function is part of a struct, consider making `chunk_size` a field of the struct instead.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-07-17T11:08:26Z

libs/client-api/src/http_file.rs

+    const CHUNK_SIZE: usize = 100 * 1024 * 1024; // 100MB chunks
+    let mut file = File::open(file_path).await?;


suggestion: Chunk size for multipart upload is hardcoded and may not be optimal for all environments.

Consider making the chunk size configurable or documenting why 100MB was chosen, as different S3 providers have varying part size and number limits.

Suggested implementation:

// Step 2: Upload file in chunks // Default chunk size is 100MB, which is a common value for S3 multipart uploads. // S3 requires parts to be at least 5MB (except the last), and has a maximum of 10,000 parts. // Make this configurable to support different environments and S3 providers. let chunk_size = chunk_size.unwrap_or(100 * 1024 * 1024); // 100MB default let mut file = File::open(file_path).await?; let mut part_number = 1; let mut parts = Vec::new();

loop { let mut chunk = vec![0u8; chunk_size]; let bytes_read = file.read(&mut chunk).await?;

You will need to add a chunk_size: Option<usize> parameter to the containing function's signature.

When calling this function, pass None to use the default, or Some(desired_size) to override.

If this function is part of a struct, consider making chunk_size a field of the struct instead.

khorshuheng · 2025-07-18T02:58:39Z

If i understand correctly, for this approach, the server will need to have sufficient disk space / memory in order to handle the file upload, since the file will be uploaded indirectly to S3 via the server instead of using presigned url directly.

This is fine (and a good way to get around the large file limitation imposed by S3) for self hosted use cases, as the server will typically have sufficient disk / memory for a single person.

But, when there are large number of users, the server will require quite a bit of resource, and may crash if multiple users are trying to upload the files at the same time.

Hence, we will likely need to handle this on the client's end i.e. client sending files directly to S3.

strophy mentioned this pull request Jul 17, 2025

feat: multipart import uploads AppFlowy-IO/AppFlowy-Web#134

Open

5 tasks

sourcery-ai bot reviewed Jul 17, 2025

View reviewed changes

feat: s3 multipart import file upload

017e84e

strophy force-pushed the multipart-import branch from 92e8a01 to 017e84e Compare July 17, 2025 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: s3 multipart import file upload #1521

feat: s3 multipart import file upload #1521

Uh oh!

strophy commented Jul 17, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot commented Jul 17, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Jul 17, 2025

Uh oh!

khorshuheng commented Jul 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

		const CHUNK_SIZE: usize = 100 * 1024 * 1024; // 100MB chunks
		let mut file = File::open(file_path).await?;

Uh oh!

feat: s3 multipart import file upload #1521

Are you sure you want to change the base?

feat: s3 multipart import file upload #1521

Uh oh!

Conversation

strophy commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for unified import file upload (small vs large files)

Class diagram for updated CreateImportTaskResponse DTO

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

khorshuheng commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

strophy commented Jul 17, 2025 •

edited

Loading

sourcery-ai bot commented Jul 17, 2025 •

edited

Loading

khorshuheng commented Jul 18, 2025 •

edited

Loading