Skip to content

Conversation

@messmerd
Copy link
Member

This is another PR coming out of the work done in #7459, and probably the last one needed before that PR can be merged.

It reworks the auto-quit feature by introducing a new AudioBus class which keeps track of which track channels are currently silent as audio flows through the effects chain.

When track channels going into an effect's input are not marked as quiet, it is assumed a signal is present and the plugin needs to wake up if it is asleep due to auto-quit. After a plugin processes a buffer, the silence status is updated.

When the auto-quit setting is disabled (that is, when effects are always kept running), effects are always assumed to have input noise (a non-quiet signal present at the plugin inputs), which should result in the same behavior as before.

Benefits:

  • The auto-quit system now closely follows how it is supposed to function by only waking plugins which have non-zero input rather than waking all plugins at once whenever an instrument plays a note or a sample track plays. This granularity better fits multi-channel plugins and pin connector routing where not all plugin inputs are connected to the same track channels. This means a sleeping plugin whose inputs are connected to channels 3/4 would not need to wake up if a signal is only present on channels 1/2.
  • Silencing channels that are already known to be silent is a no-op
  • The silence flags also could be useful for other purposes, such as adding visual indicators to represent how audio signals flow in and out of each plugin

This new system works so long as the silence flags for each channel remain valid at each point along the effect chain. Modifying the buffers without an accompanying update of the silence flags could violate assumptions. Through unit tests, the correct functioning of AudioBus itself can be validated, but its usage in AudioBusHandle, Mixer, and a few other places where track channels are handled will need to be done with care.

@sakertooth
Copy link
Contributor

To clarify, by track channels do you mean MixerChannel, or is it something different?

@messmerd
Copy link
Member Author

@sakertooth
"Track channels" is a term I borrowed from REAPER. It's the name for the L/R channels that currently exist internally within Instrument Tracks and Mixer Channels, running from the the instrument output (or mixer channel input) through each effect in the effects chain to the output.

The multi-channel plugins PR lays the groundwork for users to be able to increase the number of track channels within an instrument / mixer channel from 2 (a single stereo pair) to a maximum of 256 (arbitrarily chosen), similar to REAPER.

This PR also lays some of that groundwork, but still keeping the number of track channels fixed at 2.

@messmerd
Copy link
Member Author

I should also note that I'm unsure of the proper terminology to use, so I may change some things.

According to ChatGPT, a "bus" is not the right term for a single group of track channels or the overall collection of channels, since a "bus" usually refers to a "routing destination or summing path".

I'm thinking "(track) channel group" might be a good channel-count-agnostic alternative to "stereo pair" or "track channel bus".

And as for the class AudioBus, maybe TrackAudioBuffer or TrackChannelBuffer would be better name that avoids the term "bus".

@sakertooth
Copy link
Contributor

sakertooth commented Oct 1, 2025

The multi-channel plugins PR lays the groundwork for users to be able to increase the number of track channels within an instrument / mixer channel from 2 (a single stereo pair) to a maximum of 256 (arbitrarily chosen), similar to REAPER.

So if I understand correctly, a "track channel" is just an audio channel, like a left and right channel?

This PR also lays some of that groundwork, but still keeping the number of track channels fixed at 2.

I don't think we should keep the number of track channels fixed at 2 in the code:

  1. It will most likely make it harder to transition to an arbitrary channel count, as needed for the multi-channel plugins PR. If I understand things correctly, there is no need to bake in the decision to use 2 channels within the AudioBus code. Just have the clients of the audio bus (i.e., whoever is using AudioBus in the codebase) to configure it as necessary to work with a channel count of 2. That way, when we do make that transition for allowing arbitrary channel counts, AudioBus doesn't need to change, only the clients. Otherwise, both AudioBus (both its interface and implementation) and the clients would need to change.
  2. Baking in a channel count of 2 for AudioBus provides certain designs that may bring confusions to developers. For example, you mentioned that this PR keeps the number of track channels fixed at 2, but then we also have a constexpr variable MaxTrackChannels set to 256. That implies that the number of track channels is not fixed at 2, leading to a contradiction and likewise confusion. Another example: working in "track channel pairs" rather than "track channels" because 2 channels are still being used (if I am understanding the code correctly). This I believe brings another layer of complexity: you cant simply have clients work with the individual track channels as expected.

This is another reason why I suggest also using a planar format. As a matter of fact, I would probably recommend that AudioBus does use planar format even if clients are only working in an interleaved format for now. It seems like a natural requirement for AudioBus IMO: AudioBus consists of track channels, so working with the individual channels is quite necessary (at least without odd workarounds which cant hurt both ergonomics and performance potentially). The clients may have to do a bit of work to get the track channels in a format they can process, however:

  1. We already plan to eventually transition to using a planar format for our audio processing (surely, right?)
  2. If that plan is in motion, we have to change less code to make that happen.
  3. If the plan is in motion, its only a temporary problem.

Those are my current thoughts. Feel free to correct me on anything you believe I am misunderstanding. I really do want to understand your perspective and work.

@messmerd
Copy link
Member Author

messmerd commented Oct 1, 2025

So if I understand correctly, a "track channel" is just an audio channel, like a left and right channel?

Yes, though it refers to a single audio channel not two channels (left and right) together. I'm using the term "track channel pair" to refer to two track channels that form a single stereo channel.

I don't think we should keep the number of track channels fixed at 2 in the code

I think you're misunderstanding me. When I said it's "fixed at 2" channels, I mean we're still using 1 pair of channels and there's no way to change that at runtime:

SampleFrame* trackChannelPair = getTrackChannelPair();
auto bus = AudioBus{&trackChannelPair, 1, frames};

In the future, it might look something more like this:

SampleFrame** trackChannelPairs = getTrackChannelPairs();
auto bus = AudioBus{trackChannelPairs, numTrackChannelPairs, frames};

As you can see, AudioBus lays the groundwork for increasing the number of track channels in the future, though it's still fixed at 2 in this PR. AudioBus allows for a dynamic number of track channels (up to the arbitrary number of 256), but that functionality isn't available yet outside of the pin connector and AudioBus itself, so it's kept at 2 for now.

Btw, I'm considering making AudioBus own the buffers it contains rather than simply containing non-owning pointers to them. This would simplify how it's constructed in the examples above.

I would probably recommend that AudioBus does use planar format even if clients are only working in an interleaved format for now.

That would be a lot more work and beyond the scope of this PR. It would probably also cause performance issues until we fully convert over to planar.

@sakertooth
Copy link
Contributor

sakertooth commented Oct 1, 2025

Yes, though it refers to a single audio channel not two channels (left and right) together. I'm using the term "track channel pair" to refer to two track channels that form a single stereo channel.

Glad we are on the same page. I understood that part.

I think you're misunderstanding me. When I said it's "fixed at 2" channels, I mean we're still using 1 pair of channels and there's no way to change that at runtime

That would be a lot more work and beyond the scope of this PR. It would probably also cause performance issues until we fully convert over to planar.

Maybe I was not clear on my end.

So in an ideal scenario (disregard reality for a minute), would you agree that AudioBus should be working with float* instead of SampleFrame*? I am confident you and I both agree on this as we know that AudioBus really works with track channels as its "unit of computation"/"unit of processing" (I don't know the right terminology) to be precise, not "track channel pairs". So if ideally we truly want float*, then the interface (the method names mainly) of AudioBus should reflect what we want. The fact that we are fundamentally bound to be using SampleFrame* I would argue is an implementation detail. If we are treating SampleFrame* as the "track channel", then so be it. We are going to be using proper individual track channels in the future. It won't matter if we plan to allow for say, an odd number of track channels (i.e., 3 channels, not possible because we use work in multiples of 2) later down the line because our interfaces already reflected what we actually wanted and want the clients actually needed too.

I don't see a strong reason for having this "track channel" and "track channel pair" distinction.

IMO, design your class interfaces for what you actually want, and not for what you have to implement because of certain requirements of the problem at hand. We don't want to have to change each call site when we later want to make AudioBus have the interface it was supposed to have in the beginning.

That would be a lot more work and beyond the scope of this PR. It would probably also cause performance issues until we fully
convert over to planar.

I understand your concern, It would indeed cause an issue in regards to performance (we actually do not know without profiling, and it's more nuanced in general), but my overall take on that idea was that we can slowly move in the right direction, at least to minimize the amount of work we have to do for when we actually decide to move to planar. Moving to planar isn't a hard requirement by any means, but it fundamentally makes more sense for AudioBus to be at least thought of in a "planar" fashion. Its interfaces should reflect that.

PS: Is AudioBus is really just an AudioBuffer? It can have an arbitrary number of channels, so to me its honestly just a buffer of audio. Could probably just change SampleBuffer to make it fit with the idea, and then remove any file related stuff in it (that should belong to the clients to be honest, not the actual SampleBuffer class, its supposed to be just a buffer of audio).

Edit: I also actually think it's more of a performance hazard not using planar here though. It's not like keeping things interleaved is going to be optimal for performance. My guess is that you are striding inside your buffers to work with individual channels anyways (prevents SIMD optimizations). That's still a performance cost/downgrade, and a planar format is quite frankly the easiest, most intuitive way to think of AudioBus. I don't see how AudioBus can be thought of as being "planar but not really". It's a requirement for that class.

The only potential performance cost I can imagine using planar for AudioBus is copying interleaved contiguous data into a planar format, which is just a memcpy and will probably be optimized better than doing your actual "planar processing" in an interleaved format.

Edit 2:

which is just a memcpy

I take that back. Converting interleaved to planar will always result in some striding somewhere, but at least it is just copies and not any more complicated processing you eventually want to do for individual channels.

Then again, we would need to profile to see which is better.

@sakertooth
Copy link
Contributor

I'll try and review this at some point and maybe formalize any concerns I have a bit more pragmatically.

Copy link
Contributor

@sakertooth sakertooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we aren't using planar yet and do not want to convert to/from it on the fly, I don't think we are in the position to have something like AudioBus as of right now, especially since there are disagreements that will require more discussion/work before we do something like that. Is AudioBus truly a requirement for your auto quit rework?

@messmerd
Copy link
Member Author

@sakertooth Sorry for the late response.

would you agree that AudioBus should be working with float* instead of SampleFrame*?

Yes, eventually SampleFrame should be removed entirely.

IMO, design your class interfaces for what you actually want, and not for what you have to implement because of certain requirements of the problem at hand. We don't want to have to change each call site when we later want to make AudioBus have the interface it was supposed to have in the beginning.

Yeah, it's starting to look like we should design this with first class support for dynamically-sized groups of channels, and squashing all the track channels into a single array of SampleFrame* or single std::bitset for all the track channels is not very conducive to allowing channels groups that don't all have the same fixed size (in our case, 2).

What do you think of something like this?

inline constexpr track_ch_t   MaxTrackChannelsPerGroup = 32;
inline constexpr unsigned int MaxTrackChannelGroups = 256;

class AudioBus
{
public:

	// ...

	class ChannelGroup
	{
	public:
		// getters/setters here

	private:
		std::unique_ptr<float[]>              m_buffer;   // interleaved (at least for now)
		track_ch_t                            m_channels; // # of channels in `m_buffer` (`MaxTrackChannelsPerGroup` maximum) - currently only 2 is used
		std::bitset<MaxTrackChannelsPerGroup> m_quietChannels;
	};

	auto channelGroup(unsigned int index) const -> const ChannelGroup& { return m_groups.at(index); }

private:
	ArrayVector<ChannelGroup, MaxTrackChannelGroups> m_groups;
	f_cnt_t m_frames;

	bool m_silenceTrackingEnabled = false;
};

As you can see, this uses groups of a dynamic number of channels rather than groups of exactly 2 channels each.

PS: Is AudioBus is really just an AudioBuffer?

Yes, it's a collection of audio buffers which keeps track of silent channels. I'm definitely open to changing the name.

I also actually think it's more of a performance hazard not using planar here though. It's not like keeping things interleaved is going to be optimal for performance.

When operating on individual channels like this class requires, it's true that interleaved buffers do not seem ideal. Though if both channels are being operated on at once (which is by far the most common situation, since L is usually mapped to L, R to R, etc.), there is no striding where we skip over every N samples, so it shouldn't be less performant.

I'd imagine the cost of frequent conversions between interleaved and planar to be far more than some striding when iterating over an interleaved buffer, though I haven't measured this.

I'd be willing to attempt that conversion from interleaved to planar in this PR, though it will probably be a lot of work, and we should probably figure out the channel grouping stuff first.

Since we aren't using planar yet and do not want to convert to/from it on the fly, I don't think we are in the position to have something like AudioBus as of right now

I can attempt to convert between planar and interleaved on the fly, though even if it doesn't work out, I don't think using interleaved should block this PR from being merged since it's what we're using already and doesn't prevent us from switching to planar in the future any more than adding a new instrument/effect plugin with interleaved processing does.

Is AudioBus truly a requirement for your auto quit rework?

Yes, AudioBus encapsulates the silence tracking functionality that the revamped auto-quit feature relies on.

The current auto-quit system is probably usable even if multi-channel plugins and audio routing are added, though it's not suited for it and would probably be less efficient than it could be.

For example, when auto-quit is enabled and plugins are put to sleep, under the current auto-quit system, all sleeping plugins would need to zero their output buffers each period rather than do nothing because some of the assumptions auto-quit relies on do not hold up in a world with pin connector routing. Under this PR, silence is tracked so they could avoid a lot of writes to buffers that are already known to be silent. Auto-quit works precisely when and where it needs to rather than applying globally across the entire effects chain.

@messmerd
Copy link
Member Author

Interestingly, it looks like VST3 has an AudioBusBuffers class which offers silence tracking similar to what I'm doing in this PR:
https://steinbergmedia.github.io/vst3_doc/vstinterfaces/structSteinberg_1_1Vst_1_1AudioBusBuffers.html

AudioBusBuffers corresponds to the AudioBus::ChannelGroup class I described above.

@sakertooth
Copy link
Contributor

sakertooth commented Nov 16, 2025

Yeah, it's starting to look like we should design this with first class support for dynamically-sized groups of channels, and squashing all the track channels into a single array of SampleFrame* or single std::bitset for all the track channels is not very conducive to allowing channels groups that don't all have the same fixed size (in our case, 2).

What do you think of something like this?

inline constexpr track_ch_t   MaxTrackChannelsPerGroup = 32;
inline constexpr unsigned int MaxTrackChannelGroups = 256;

class AudioBus
{
public:

	// ...

	class ChannelGroup
	{
	public:
		// getters/setters here

	private:
		std::unique_ptr<float[]>              m_buffer;   // interleaved (at least for now)
		track_ch_t                            m_channels; // # of channels in `m_buffer` (`MaxTrackChannelsPerGroup` maximum) - currently only 2 is used
		std::bitset<MaxTrackChannelsPerGroup> m_quietChannels;
	};

	auto channelGroup(unsigned int index) const -> const ChannelGroup& { return m_groups.at(index); }

private:
	ArrayVector<ChannelGroup, MaxTrackChannelGroups> m_groups;
	f_cnt_t m_frames;

	bool m_silenceTrackingEnabled = false;
};

I think I am slowly understanding this a bit better. Your AudioBus is a collection of audio buffers, so for example maybe we could have a stereo input buffer and a sidechain mono buffer inside one AudioBus. I was confused why we were trying to "imitate" planar here but its just that the audio bus contains multiple audio buffers, with each buffer having their own channel count and the like.

That being said, I think my stance on this being named AudioBuffer was somewhat inaccurate. Honestly, you could name it AudioBufferBus if you wanted to be more descriptive, but I think AudioBus by itself is an okay name. I'm just hoping it wont get confused with something else. ChannelGroup could be just named Group but I think what you named it currently is okay too.

For the ChannelGroup's themselves, I would say using interleaved here is fine. I was mostly confused with the "planar-like" structure of this class and thought it was trying to use planar buffers but couldn't because of some limitations with the code being all interleaved currently.

Though, I am confused how we would handle situations (for example) where one of the ChannelGroup's is interleaved, and maybe the other is planar, though our plugins are always taking in as input and outputting interleaved stereo audio buffers. I guess this is where the question of "do we convert on the fly or something else?" comes into play.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants