Skip to content

[Bug] Sink CSV format, enable GZ compression to batch write to Doris, will get stuck and counter release error #614

@qq461613840

Description

@qq461613840

Search before asking

  • I had searched in the issues and found no similar issues.

Version

25.1.0

What's Wrong?

CSV format with gz compression enabled will freeze after running for a certain period of time
2025-09-09 15:54:51,372 INFO org.apache.doris.flink.sink.batch.DorisBatchStreamLoad [] - Cache full, waiting for flush, currentBytes: 314572855, maxBlockedBytes: 314572800
2025-09-09 15:54:52,335 INFO org.apache.doris.flink.sink.batch.DorisBatchStreamLoad [] - bufferMap is empty, no need to flush null
2025-09-09 15:54:52,372 INFO org.apache.doris.flink.sink.batch.DorisBatchStreamLoad [] - Cache full, waiting for flush, currentBytes: 314572855, maxBlockedBytes: 314572800
2025-09-09 15:54:53,372 INFO org.apache.doris.flink.sink.batch.DorisBatchStreamLoad [] - Cache full, waiting for flush, currentBytes: 314572855, maxBlockedBytes: 314572800
2025-09-09 15:54:54,335 INFO org.apache.doris.flink.sink.batch.DorisBatchStreamLoad [] - bufferMap is empty, no need to flush null
2025-09-09 15:54:54,373 INFO org.apache.doris.flink.sink.batch.DorisBatchStreamLoad [] - Cache full, waiting for flush, currentBytes: 314572855, maxBlockedBytes: 314572800

My configuration is

Properties props = new Properties();
props.setProperty("column_separator", ",");
props.setProperty("line_delimiter", "\n");
props.setProperty("format", "csv");
props.setProperty("compress_type", "gz");
return DorisExecutionOptions.builder()
.setLabelPrefix(tableName + "-" + System.currentTimeMillis())
.setDeletable(false)
.setBatchMode(true)
.setBufferFlushMaxRows(20000)
.setBufferFlushIntervalMs(2000)
.setStreamLoadProp(props)
.build();

After reviewing the source code, I suspect that the cacheBeforeFlushBytes record represents the size before compression. After sending data, currentCacheBytes.getAndAdd(-respContent.getLoadBytes()); represents the compressed size. This causes the value to not be released. This then leads to the problem of "Cache full, waiting for flush" and a constant locked state.

What You Expected?

Hope to solve this bug

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions