Skip to content
Discussion options

You must be logged in to vote

Is is possible its linked to concurrency / target file sizes / other related parameters? Could those cause similar behaviour and any specific configuration parameters you think I should take a look into?

What i think is happening is that the page cache is being populated by the writes, which is why memory_cache and working_set_bytes are increasing. Lowering parameters like parquet_target_file_size or parquet_target_row_group_size will make daft write more frequently, but when those pages get flushed to disk is up to the kernel, not Daft.

Daft does spawn child processes for some operations (and as far as I understand, memory usage in these processes would not count towards RSS), but I c…

Replies: 5 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by colin-ho
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
bug Something isn't working needs triage
2 participants
Converted from issue

This discussion was converted from issue #5498 on November 12, 2025 01:40.