Skip to content

WIP lgalloc limiter #32602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

antiguru
Copy link
Member

@antiguru antiguru commented May 28, 2025

Signed-off-by: Moritz Hoffmann [email protected]

Part of MaterializeInc/database-issues/issues/9306

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

Signed-off-by: Moritz Hoffmann <[email protected]>
Copy link
Contributor

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry, started commenting on this before I realized it was still in draft! Submitting since it's already typed out, but feel free to ignore.)


self.tx
.send(Update::DiskLimit(Some(disk_limit)))
.expect("Sender exists");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.expect("Sender exists");
.expect("Receiver exists");

(Which is what you have on the first send call, and seems more accurate.)

Interval(Duration),
DiskLimit(Option<usize>),
BurstBudget(usize),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending these as two/three separate messages every time the config changes leads to some odd transient states -- for example, we can briefly have the disk-limit re-enabled but be using a stale value for the burst budget. May not matter, but seems easier to reason about if this were a struct and all configs were updated atomically?

@@ -354,6 +355,7 @@ async fn run(args: Args) -> Result<(), anyhow::Error> {
ComputeInstanceContext {
scratch_directory: args.scratch_directory,
worker_core_affinity: args.worker_core_affinity,
announce_memory_limit: args.announce_memory_limit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this being called announce_memory_limit instead of just memory_limit. Is there a good reason? If not, it's probably too much of a hassle to change the command-line option, but we can change the internal variable names.

"Multiplicative bias to lgalloc_limiter_usage_factor.",
);

/// Bias to the lgalloc limiter usage factor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Bias to the lgalloc limiter usage factor.
/// Burst factor to disk limit.


/// Bias to the lgalloc limiter usage factor.
pub const LGALLOC_LIMITER_BURST_FACTOR: Config<f64> = Config::new(
"lgalloc_limiter_BURST_FACTOR",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"lgalloc_limiter_BURST_FACTOR",
"lgalloc_limiter_burst_factor",

/// but it might delete previous metrics. If we ever want to change this, we should
/// remove the shared static mutex and make this function return a handle to the metrics.
///
/// This function is async, because it needs to be called from a tokio runtime context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how much value this pattern adds. You add some noise but needing the async keyword, the allow, and this comment. You get protection against calling this function outside an async (not necessarily tokio!) context, but I think you also have this protection through CI, as spawning a task would immediately panic if no tokio runtime was available.

If we want to use this pattern here, shouldn't Limiter::create_task also be async?

///
/// This function is async, because it needs to be called from a tokio runtime context.
#[allow(clippy::unused_async)]
pub async fn register_metrics_into(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be called run_limiter or somesuch. A call to "register metrics" spawning a non-metrics task in the background is surprising. A call to "run limiter" running a limiter that also registers metrics into the provided registry is not.

Comment on lines +79 to +83
// Get lgalloc stats and obtain the disk utilization from file stats, summed across all
// files and size classes. Compare the disk utilization against the configured disk limit,
// and if it exceeds the limit, reduce the burst budget by the amount of disk utilization
// that exceeds the limit. If the burst budget is exhausted, we will not allow any more disk
// access and terminate the process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This fits better in the module docstring imo.

/// Bias to the lgalloc limiter usage factor.
pub const LGALLOC_LIMITER_BURST_FACTOR: Config<f64> = Config::new(
"lgalloc_limiter_BURST_FACTOR",
0.5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.5 seems like a too-small value, assuming it means you get "0.5 x limit byte-seconds" of burst. If a hydration takes 10 minute that's less than 0.1% of additional disk you can use.

In any case, we should probably disable bursting by default, and make switching it on a conscious decision. Then we can also decide how much bursting we need based on the use case we want to enable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it should be disabled by default.

disk_utilization,
disk_limit
);
std::process::exit(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check what implications existing with code 1 has. It might trigger alerts in our monitoring or show the replica as crashed due to an error in the console. Ideally we want this case the look the same as an OOD scenario.

Also, we should log a warning, not an error. Errors will produce noise in Sentry we don't want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants