Skip to content

Conversation

@Tipuch
Copy link

@Tipuch Tipuch commented Dec 21, 2025

solves #1254
Based on https://www.ietf.org/archive/id/draft-ietf-ccwg-bbr-04.txt

tested with sudo tc qdisc add dev lo root netem delay 50ms 10ms distribution normal with a big file, this is a single connection
and it's BBR3 vs Cubic (the default) for quinn

1206KiB/s for BB3 vs 1049 KiB/s for Cubic so approximately 13% faster transfer speed (this is with a big file so over a long period of time).

Screenshot_20251221_093603 Screenshot_20251221_093639

Let me know if you need any additional information or if you want me to change the implementation in some ways.

@Ralith
Copy link
Collaborator

Ralith commented Dec 23, 2025

Thanks for working on this! I'm excited to have this feature, and those performance numbers look promising. It's hard to read the vertical scales, but I guess the second chart is the new path? The pacing rate units don't look quite right.

What conditions was this tested in? It'd be interesting to contrast e.g. loopback, LANs, wifi, cellular, long fat networks, satellite networks, etc. We don't need all of that to merge this, but it would help inform what our guidance to users on its use should be, and might motivate future changes in default configuration.

Is there a strong reason for us to keep the old BBR code around? It hasn't received any attention in years, so unless it serves a very strong purpose I think it would be better for this change to replace it outright, perhaps by just deleting it in a final commit.

As this is a large body of algorithmically nuanced code, review will be difficult. To start with, can you reorganize the PR as a series of incremental, linear commits? For example, please remove merge commits and squash together WIP, incomplete, or later-revised changes, and separate changes to existing Quinn interfaces from the introduction of new logic.

@Tipuch
Copy link
Author

Tipuch commented Dec 25, 2025

This was tested on local with simulated network latency (between a client and a server from the examples provided in quinn) 100 ms ping with 20 ms jitter.
The top one is Cubic and the bottom one is BBR3, as for the units I used the same units as the previous implementation of BBR but I guess I'll change that for Bytes/sec instead. I'll reorder and polish the commit history as well.
I'll try testing it in different conditions and post the graphs here as well.
As for the previous version of BBR I think we can replace it as it doesn't seem to me like it was ever in working condition (because some of the output parameters weren't being applied to the connection).

I'll edit the history here soon, change the units for pacing rate and post more graphs under different condition :)

@Tipuch
Copy link
Author

Tipuch commented Dec 25, 2025

Alright I have cleaned up the history of commits and adjusted the units for pacing_rate

I'll get to doing some more tests under different circumstances now.

@Tipuch
Copy link
Author

Tipuch commented Dec 28, 2025

I'm not sure what's the best way to share the graphs with you, I tried to get an average sample and no outliers (positive or negative) for cubic vs bbr3:
bbr3.html
bbr3_lo.html
bbr3_lo_tc.html
bbr3_wifi.html
cubic.html
cubic_lo.html
cubic_lo_tc.html
cubic_wifi.html

@Tipuch
Copy link
Author

Tipuch commented Dec 28, 2025

Using the perf server and client with default settings and within lo without limits and using tc qdisc add dev lo root netem delay 50ms 10ms distribution normal

bbr3:
bbr3_perf_screenshot
bbr3_perf_tc_screenshot

cubic:
cubic_perf_screenshot
cubic_perf_tc_screenshot

Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together! The commit organization is great.

Because this is marked as experimental and replaces obsolete and equally experimental code, I'm not inclined to block this on an in-depth review of the actual BBR logic, which would otherwise take us much longer. That said, some documentation around it would dramatically improve our ability to notice implementation errors, and make it much easier for yourself or others to update and debug it in the future. Appropriate documentation might heavily cite or even excerpt the IETF draft.

}
self.path
.congestion
.on_packet_acked(now, info.time_sent, info.size, pn, &self.path.rtt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're calling this at almost exactly the same times as on_ack. Does it need to be a separate function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so for this, I used to use this but it was definitely missing some of the packets that were registered on send within bbr, I believe it's because we're looking for ack_eliciting acks before sending on_ack to the congestion controller. I didn't want to break / modify implementations present within other congestion algorithms so I decided to add a separate event here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that BBR should be tracking non-ACK-eliciting packets at all? I suspect it shouldn't. There's a reason the existing controllers ignore them: the peer is, as the name suggests, not obliged to respond to such packets on any particular schedule, so we can't reliably infer anything from the timing of their acknowledgements.

The real issue here might be that we're incorrectly calling on_sent on non-ACK-eliciting packets. If that makes sense, then I'd rather correct that than change or substantially duplicate the on_ack behavior.

If(!) there is a clear need for non-ACK-eliciting packets to be tracked, then I'd prefer to add an ack_eliciting: bool argument to both on_sent and on_ack rather than adding more methods, but I suspect simply ignoring such packets for congestion control purposes is most correct, and simpler for controllers besides.

@Tipuch
Copy link
Author

Tipuch commented Jan 11, 2026

I just noticed there are some inconsistencies with what the value of the pacing gain when in the draining state in the ietf draft, I'll update its value soon with additional justifications

@djc
Copy link
Member

djc commented Jan 11, 2026

Please rebase and squash all your commits on your next push.

@Ralith
Copy link
Collaborator

Ralith commented Jan 11, 2026

(don't literally squash them all, just assemble into a logical progression like before)

@Tipuch
Copy link
Author

Tipuch commented Jan 12, 2026

I introduced a bug within the latest changes, so I will first find it and fix it and then clean up the commits.

@Tipuch
Copy link
Author

Tipuch commented Jan 13, 2026

I introduced a bug within the latest changes, so I will first find it and fix it and then clean up the commits.

fixed the bug, there were a couple errors in calculations but the main issue was with how inflight was set, ack_eliciting is the number of packets, not an amount of bytes, resulting in underestimating inflight by a lot

@Tipuch
Copy link
Author

Tipuch commented Jan 13, 2026

Alright cleaned up the commits :)

@Tipuch Tipuch force-pushed the bbrv3 branch 2 times, most recently from c716f82 to ae39099 Compare January 13, 2026 14:42
@Tipuch Tipuch requested a review from Ralith January 16, 2026 11:05
}
self.path
.congestion
.on_packet_acked(now, info.time_sent, info.size, pn, &self.path.rtt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that BBR should be tracking non-ACK-eliciting packets at all? I suspect it shouldn't. There's a reason the existing controllers ignore them: the peer is, as the name suggests, not obliged to respond to such packets on any particular schedule, so we can't reliably infer anything from the timing of their acknowledgements.

The real issue here might be that we're incorrectly calling on_sent on non-ACK-eliciting packets. If that makes sense, then I'd rather correct that than change or substantially duplicate the on_ack behavior.

If(!) there is a clear need for non-ACK-eliciting packets to be tracked, then I'd prefer to add an ack_eliciting: bool argument to both on_sent and on_ack rather than adding more methods, but I suspect simply ignoring such packets for congestion control purposes is most correct, and simpler for controllers besides.

/// by definition, and it samples the most recent one. So we restart fresh on
/// every new max and overwrites 2nd & 3rd choices. The same property
/// holds for 2nd & 3rd best.
///
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stray empty line

@Tipuch Tipuch requested a review from Ralith January 24, 2026 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants