[POC, Do not merge] input_chunk: split incoming buffer when it's too big #9385

braydonk · 2024-09-13T19:44:45Z

This PR is a proof of concept for mitigating the issue where a chunk can be too large when received from an input plugin.

Bug Explanation

When a large set of data is read at one time, all these records are appended into whichever chunk is the most recently active, and all the records are written at once. The check for the chunk size only happens after writing data to the chunk. So despite the chunk size being "limited" to 2M, this doesn't guarantee that it won't exceed that number. In this case, we could easily have a chunk that is right up close to the 2M limit, and then have loads of data written to it leading to an excessively large chunk that once encoded can exceed write limits of output plugin APIs.

Proposed Solution

This solution is an attempt to mitigate the problem without immense restructuring. The strategy is to examine the size of the incoming buffer, and if it exceeds the FLB_CHUNK_FS_MAX_SIZE (2M), the buffer is split into separate buffers that are under the max size, and are all appended to chunks separately. This is paired with a check when retrieving a new input chunk, which checks if appending the current buffer will exceed the chunk size limit, and if so a new chunk is created.

This solution is not perfect, but it was the best way I could find within my power (i.e. I don't consider major restructures to this code or chunkio to be "within my power").

Issues: #9374, #1938

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change
Debug log output from testing the change

[] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Run local packaging test showing all targets (including any new ones) build.
Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

Documentation required for this feature

Backporting

Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Add a configuration value for the storage chunk max size. Signed-off-by: braydonk <[email protected]>

github-actions · 2025-02-01T02:04:24Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

pierluigilenoci · 2025-02-04T09:34:47Z

@braydonk stale?

braydonk · 2025-02-04T13:36:16Z

Yeah, I haven't had time to work on this problem for a while. I'd like to come back to it but can't say when I'll be able to.

braydonk · 2025-02-24T03:43:31Z

This PR is superseded by #9995, a better (imo) version of the same solution.

github-actions · 2025-09-07T02:11:07Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

braydonk added 6 commits September 12, 2024 21:22

non-functioning skeleton of the idea

7ce7e4c

lossy version working

6f87ef4

I wasted a lot of time not doing it this way

aeb43e2

working version with one memory leak left

cbcd35c

do the buffer ownership stuff properly

8e9102f

fix const discard warnings

756b733

braydonk requested review from edsiper, fujimotos, koleini and leonardo-albertovich as code owners September 13, 2024 19:44

github-actions bot added the docs-required label Sep 13, 2024

braydonk temporarily deployed to pr September 13, 2024 19:45 — with GitHub Actions Inactive

remove testing log statements

161e834

braydonk temporarily deployed to pr September 13, 2024 19:47 — with GitHub Actions Inactive

braydonk mentioned this pull request Sep 13, 2024

Allow output plugins to configure a max chunk size #1938

Open

braydonk temporarily deployed to pr September 13, 2024 20:08 — with GitHub Actions Inactive

config: add storage.chunk_max_size

c2e626a

Add a configuration value for the storage chunk max size. Signed-off-by: braydonk <[email protected]>

braydonk temporarily deployed to pr October 1, 2024 13:08 — with GitHub Actions Inactive

braydonk temporarily deployed to pr October 1, 2024 13:32 — with GitHub Actions Inactive

ryanohnemus mentioned this pull request Oct 8, 2024

Performance Testing of Fluent-bit with several filters shows log processing falling < 5mb/s #9399

Open

lecaros added enhancement community-feedback labels Oct 28, 2024

github-actions bot added the Stale label Feb 1, 2025

github-actions bot removed the Stale label Feb 13, 2025

github-actions bot added the Stale label Sep 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[POC, Do not merge] input_chunk: split incoming buffer when it's too big #9385

[POC, Do not merge] input_chunk: split incoming buffer when it's too big #9385

Uh oh!

braydonk commented Sep 13, 2024

Uh oh!

github-actions bot commented Feb 1, 2025

Uh oh!

pierluigilenoci commented Feb 4, 2025

Uh oh!

braydonk commented Feb 4, 2025

Uh oh!

braydonk commented Feb 24, 2025

Uh oh!

github-actions bot commented Sep 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[POC, Do not merge] input_chunk: split incoming buffer when it's too big #9385

Are you sure you want to change the base?

[POC, Do not merge] input_chunk: split incoming buffer when it's too big #9385

Uh oh!

Conversation

braydonk commented Sep 13, 2024

Bug Explanation

Proposed Solution

Uh oh!

github-actions bot commented Feb 1, 2025

Uh oh!

pierluigilenoci commented Feb 4, 2025

Uh oh!

braydonk commented Feb 4, 2025

Uh oh!

braydonk commented Feb 24, 2025

Uh oh!

github-actions bot commented Sep 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants