Skip to content

compute,storage: reduce the size of cluster commands #32261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 21, 2025

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Apr 18, 2025

ComputeCommand and StorageCommand are enums, so their size is dependent on the size of the largest variant. Both contain some huge variants, in particular UpdateConfiguration and the ones describing dataflows to install, bringing the size of the enums to 3-4 KB. In contrast, the variant we send most frequently, AllowCompaction, is only 40 bytes in size, so when handling an AllowCompaction command, we waste 3-4 KB of space. Commands are stored in histories and channels, so this amount of waste can have an impact on our memory usage.

This PR removes the waste and brings the size of the command enum down to 40 bytes by boxing large fields. This makes code creating commands a bit more noisy, but no way to avoid that I think.

(First two commits are from #32258 and can be ignored here.)

Motivation

  • This PR refactors existing code.

Tips for reviewer

I didn't try the reduce the size of the responses here for two reasons:

  • They are only 120 bytes in size, so the waste is much smaller than for commands.
  • At least for compute, the 120-bytes response is also the one we send most of the time (Frontiers), so moving its data to the heap wouldn't safe much memory overall.

That's not to say we shouldn't try to reduce the memory usage of responses too, just that it's less urgent.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje
Copy link
Contributor Author

teskje commented Apr 18, 2025

For reference, here are the memory layouts of the two enums before this change:

struct mz_compute_client::protocol::command::ComputeCommand<mz_repr::timestamp::Timestamp>
        size: 3816
        members:
                0[8]    <anon>: u64
                0[3816] <variant part>
                        CreateTimely: <2>
                                0[8]    <padding>
                                8[64]   config: struct mz_cluster_client::client::TimelyConfig
                                72[16]  epoch: struct mz_cluster_client::client::ClusterStartupEpoch
                                88[3728]        <padding>
                        CreateInstance: <3>
                                0[8]    <padding>
                                8[64]   __0: struct mz_compute_client::protocol::command::InstanceConfig
                                72[3744]        <padding>
                        InitializationComplete: <4>
                        AllowWrites: <5>
                        UpdateConfiguration: <0>
                                0[3816] __0: struct mz_compute_client::protocol::command::ComputeParameters
                        CreateDataflow: <7>
                                0[8]    <padding>
                                8[352]  __0: struct mz_compute_types::dataflows::DataflowDescription<mz_compute_types::plan::render_plan::RenderPlan<mz_repr::timestamp::Timestamp>, mz_stora
ge_types::controller::CollectionMetadata, mz_repr::timestamp::Timestamp>
                                360[3456]       <padding>
                        Schedule: <8>
                                0[8]    <padding>
                                8[16]   __0: struct mz_repr::global_id::GlobalId
                                24[3792]        <padding>
                        AllowCompaction: <9>
                                0[8]    <padding>
                                8[16]   id: struct mz_repr::global_id::GlobalId
                                24[24]  frontier: struct timely::progress::frontier::Antichain<mz_repr::timestamp::Timestamp>
                                48[3768]        <padding>
                        Peek: <10>
                                0[8]    <padding>
                                8[544]  __0: struct mz_compute_client::protocol::command::Peek<mz_repr::timestamp::Timestamp>
                                552[3264]       <padding>
                        CancelPeek: <11>
                                0[8]    <padding>
                                8[16]   uuid: struct uuid::Uuid
                                24[3792]        <padding>

struct mz_storage_client::client::StorageCommand<mz_repr::timestamp::Timestamp>
        size: 4352
        members:
                0[8]    <anon>: u64
                0[4352] <variant part>
                        CreateTimely: <2>
                                0[8]    <padding>
                                8[64]   config: struct mz_cluster_client::client::TimelyConfig
                                72[16]  epoch: struct mz_cluster_client::client::ClusterStartupEpoch
                                88[4264]        <padding>
                        InitializationComplete: <3>
                        AllowWrites: <4>
                        UpdateConfiguration: <0>
                                0[4352] __0: struct mz_storage_types::parameters::StorageParameters
                        RunIngestion: <6>
                                0[8]    <padding>
                                8[1800] __0: struct mz_storage_client::client::RunIngestionCommand
                                1808[2544]      <padding>
                        AllowCompaction: <7>
                                0[8]    <padding>
                                8[16]   __0: struct mz_repr::global_id::GlobalId
                                24[24]  __1: struct timely::progress::frontier::Antichain<mz_repr::timestamp::Timestamp>
                                48[4304]        <padding>
                        RunSink: <8>
                                0[8]    <padding>
                                8[2240] __0: struct mz_storage_client::client::RunSinkCommand<mz_repr::timestamp::Timestamp>
                                2248[2104]      <padding>
                        RunOneshotIngestion: <9>
                                0[8]    <padding>
                                8[704]  __0: struct mz_storage_client::client::RunOneshotIngestion
                                712[3640]       <padding>
                        CancelOneshotIngestion: <10>
                                0[8]    <padding>
                                8[16]   __0: struct uuid::Uuid
                                24[4328]        <padding>

`ComputeCommand` and `StorageCommand` are enums, so their size is
dependent on the size of the largest variant. Both contain some huge
variants, in particular `UpdateConfiguration` and the ones describing
dataflows to install, bringing the size of the enums to 3-4 KB. In
contrast, the variant we send most frequently, `AllowCompaction`, is
only 40 bytes in size, so when handling an `AllowCompaction` command, we
waste 3-4 KB of space. Commands are stored in histories and channels, so
this amount of waste can have an impact on our memory usage.

This commit removes the waste and brings the size of the command enum
down to 40 bytes by boxing large fields. This makes code creating
commands a bit more noisy, but no way to avoid that I think.
@teskje teskje marked this pull request as ready for review April 21, 2025 08:11
@teskje teskje requested a review from a team as a code owner April 21, 2025 08:11
@teskje teskje requested a review from antiguru April 21, 2025 08:11
Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@teskje
Copy link
Contributor Author

teskje commented Apr 21, 2025

TFTR!

@teskje teskje merged commit 8ca896c into MaterializeInc:main Apr 21, 2025
82 checks passed
@teskje teskje deleted the small-commands branch April 21, 2025 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants