Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] sACK and basic congestion control, to be broken I into separate PRs #450

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

loriopatrick
Copy link

No description provided.

@loriopatrick
Copy link
Author

This has become sort of a catch all for various improvements I need to use this library in playit.gg. After more real world testing and adhoc patches I'll be looking to make a proper PR.

@jhwgh1968
Copy link
Contributor

Ping me if you would like me to test something, @loriopatrick. I have a pretty good software setup that found issues in my original phase 1 PR.

Before I saw your comment, I was about to say: doing all of sACK support at once seems like a big PR! 😄 The "finish then split" approach makes sense. (I would probably change the title and change it back, but since this is Draft I don't think it matters that much.)

I would also note, I broke my implementation plan into phases not only for testing, but so it would be easier to review. Several times in the past, people have opened large PRs which completely implement a feature or overhaul a system like this. @whitequark always wanted them broken up, either into stand-alone parts of the feature or improvements it required that came before.

@loriopatrick loriopatrick changed the title [WIP] Complete sACK support [WIP] sACK and basic congestion control, to be broken I into separate PRs Mar 28, 2021
@loriopatrick
Copy link
Author

loriopatrick commented Mar 28, 2021

Thanks @jhwgh1968! I still have a good number of unit tests to write but doing some real world testing we've seen great performance improvements. If the setup isn't too difficult to run I could use some early feedback, tho I understand if you'd prefer to wait for a bit more testing before starting that engine.

Your phase breakdowns were really helpful to get me started. Once I the aggregate patches feel solid I'll start breaking them up into smaller PRs for review.

@jhwgh1968
Copy link
Contributor

Great. I'll see how it goes over the next week or so, and plan to give it a test run on the weekend.

@pothos
Copy link
Contributor

pothos commented Apr 1, 2021

Didn't have a look, but are you aware of https://github.com/cbranch/smoltcp/branches/stale?
It contains some changes that are not upstreamed, (see the fixes[2] branch)
or an attempt at New Reno Congestion Control: https://github.com/cbranch/smoltcp/commits/cbranch/cc

@jhwgh1968
Copy link
Contributor

So, I don't know if it is your fix @loriopatrick or an existing issue (because I remember patching before), but the server example on the sinkhole port seems to have a bit of trouble with some buffer handling:

[1617561094.218s] -> EthernetII src=02-00-00-00-00-01 dst=aa-64-47-35-a4-bd type=IPv4
                \ IPv4 src=192.168.69.1 dst=192.168.69.100 proto=TCP
                 \ TCP src=6971 dst=56088 seq=2125311720 ack=2169671772 win=65535 len=0
[1617561094.218s] <- EthernetII src=aa-64-47-35-a4-bd dst=02-00-00-00-00-01 type=IPv4
                \ IPv4 src=192.168.69.100 dst=192.168.69.1 proto=TCP
                 \ TCP src=56088 dst=6971 seq=2169705352 ack=2125311720 win=502 len=1460
[1617561094.219s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:56088: rx buffer: receiving 1460 octets at offset 33580
[1617561094.219s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:56088: assembler: [ (1460) 5840 (1460) 18980 (1460) 5840 (30495) ]
[1617561094.219s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:56088: ACKing incoming segment
[1617561094.219s] (socket::tcp): sending sACK option with current assembler ranges
[src/iface/ethernet.rs:1489] (&tx_buffer.as_ref().len(), tx_len) = (
    82,
    82,
)
[1617561094.219s] -> EthernetII src=02-00-00-00-00-01 dst=aa-64-47-35-a4-bd type=IPv4
                \ IPv4 src=192.168.69.1 dst=192.168.69.100 proto=TCP
                 \ TCP src=6971 dst=56088 seq=2125311720 ack=2169671772 win=65535 len=0
[1617561094.220s] <- EthernetII src=aa-64-47-35-a4-bd dst=02-00-00-00-00-01 type=IPv4
                \ IPv4 src=192.168.69.100 dst=192.168.69.1 proto=TCP
                 \ TCP src=56088 dst=6971 seq=2169671772 ack=2125311720 win=502 len=1460
[1617561094.221s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:56088: rx buffer: receiving 1460 octets at offset 0
[1617561094.221s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:56088: rx buffer: enqueueing 7300 octets (now 7300)
[1617561094.221s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:56088: assembler: [ (1460) 18980 (1460) 5840 (37795) ]
[1617561094.221s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:56088: starting delayed ack timer
[1617561094.221s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:56088: ACKing incoming segment
[1617561094.221s] (socket::tcp): sending sACK option with current assembler ranges
[1617561094.221s] (phy::fault_injector): tx: randomly dropping a packet
[src/iface/ethernet.rs:1489] (&tx_buffer.as_ref().len(), tx_len) = (
    1536,
    82,
)
thread 'main' panicked at 'assertion failed: tx_buffer.as_ref().len() == tx_len', /tmp/smoltcp/src/iface/ethernet.rs:1490:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@Dirbaio
Copy link
Member

Dirbaio commented Apr 7, 2021

It's a bug in FaultInjector, fixed in #463

@jhwgh1968
Copy link
Contributor

jhwgh1968 commented Apr 9, 2021

Thanks, @Dirbaio.

I have now tested it, and the verdict is: there might be a timer interaction.

I ran the server example with a 10% drop rate, and sent a large file (~350 MB) to the sinkhole port. The ACKs were behaving well. But after a while, the packets were filling up receiver space because the CPU could not keep up.

This is actually an intentional test, because the rx buffer drops things, and the TCP protocol is expected to handle that. While the sACKs seem correct, it never the less timed out when the buffer got too full:

[1617933661.925s] <- EthernetII src=aa-64-47-35-a4-bd dst=02-00-00-00-00-01 type=IPv4
                \ IPv4 src=192.168.69.100 dst=192.168.69.1 proto=TCP
                 \ TCP src=35692 dst=6971 psh seq=3677930455 ack=993492291 win=502 len=1460
[1617933661.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: receiving 1460 octets at offset 10220
[1617933661.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: assembler: [ (1460) 10220 (53855) ]
[1617933661.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: ACKing incoming segment
[1617933661.925s] (socket::tcp): sending sACK option with current assembler ranges
[1617933661.925s] -> EthernetII src=02-00-00-00-00-01 dst=aa-64-47-35-a4-bd type=IPv4
                \ IPv4 src=192.168.69.1 dst=192.168.69.100 proto=TCP
                 \ TCP src=6971 dst=35692 seq=993492291 ack=3677920235 win=65535 len=0
[1617933661.925s] <- EthernetII src=aa-64-47-35-a4-bd dst=02-00-00-00-00-01 type=IPv4
                \ IPv4 src=192.168.69.100 dst=192.168.69.1 proto=TCP
                 \ TCP src=35692 dst=6971 seq=3677931915 ack=993492291 win=502 len=1460
[1617933661.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: receiving 1460 octets at offset 11680
[1617933661.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: assembler: [ (1460) 11680 (52395) ]
[1617933661.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: ACKing incoming segment
[1617933661.925s] (socket::tcp): sending sACK option with current assembler ranges
[1617933661.925s] -> EthernetII src=02-00-00-00-00-01 dst=aa-64-47-35-a4-bd type=IPv4
                \ IPv4 src=192.168.69.1 dst=192.168.69.100 proto=TCP
                 \ TCP src=6971 dst=35692 seq=993492291 ack=3677920235 win=65535 len=0

[... several more identical messages within the same second ...]

[1617933661.926s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: receiving 1460 octets at offset 0
[1617933661.926s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: enqueueing 1460 octets (now 29200)
[1617933661.926s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: delayed ack timer already started, forcing expiry
[1617933661.926s] <- EthernetII src=aa-64-47-35-a4-bd dst=02-00-00-00-00-01 type=IPv4
                \ IPv4 src=192.168.69.100 dst=192.168.69.1 proto=TCP
                 \ TCP src=35692 dst=6971 seq=3677949435 ack=993492291 win=502 len=1460
[1617933661.926s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: receiving 1460 octets at offset 0
[1617933661.926s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: enqueueing 1460 octets (now 30660)
[1617933661.927s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: starting delayed ack timer
[1617933661.927s] <- EthernetII src=aa-64-47-35-a4-bd dst=02-00-00-00-00-01 type=IPv4
                \ IPv4 src=192.168.69.100 dst=192.168.69.1 proto=TCP
                 \ TCP src=35692 dst=6971 seq=3677950895 ack=993492291 win=502 len=1460
[1617933661.927s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: receiving 1460 octets at offset 0
[1617933661.927s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: enqueueing 1460 octets (now 32120)
[1617933661.927s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: delayed ack timer already started, forcing expiry
[1617933661.927s] <- EthernetII src=aa-64-47-35-a4-bd dst=02-00-00-00-00-01 type=IPv4
                \ IPv4 src=192.168.69.100 dst=192.168.69.1 proto=TCP
                 \ TCP src=35692 dst=6971 psh seq=3677952355 ack=993492291 win=502 len=501
[1617933661.927s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: receiving 501 octets at offset 0
[1617933661.927s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: rx buffer: enqueueing 501 octets (now 32621)
[1617933661.927s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: starting delayed ack timer
[1617933661.927s] (server): tcp:6971 recv 32621 octets
[1617933661.937s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: outgoing segment will acknowledge
[1617933661.937s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: sending ACK
[1617933661.937s] -> EthernetII src=02-00-00-00-00-01 dst=aa-64-47-35-a4-bd type=IPv4
                \ IPv4 src=192.168.69.1 dst=192.168.69.100 proto=TCP
                 \ TCP src=6971 dst=35692 seq=993492291 ack=3677952856 win=65535 len=0
[1617933661.937s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: stop delayed ack timer
[1617933662.938s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: keep-alive timer expired
[1617933662.938s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: sending a keep-alive
[1617933662.938s] -> EthernetII src=02-00-00-00-00-01 dst=aa-64-47-35-a4-bd type=IPv4
                \ IPv4 src=192.168.69.1 dst=192.168.69.100 proto=TCP
                 \ TCP src=6971 dst=35692 seq=993492290 ack=3677952856 win=65535 len=1
[1617933662.938s] (phy::fault_injector): rx: randomly dropping a packet
[1617933662.938s] (server): poll error: buffer space exhausted
[1617933663.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: timeout exceeded
[1617933663.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: state=ESTABLISHED=>CLOSED
[1617933663.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: outgoing segment will abort connection
[1617933663.925s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:35692: sending RST|ACK
[1617933663.925s] -> EthernetII src=02-00-00-00-00-01 dst=aa-64-47-35-a4-bd type=IPv4
                \ IPv4 src=192.168.69.1 dst=192.168.69.100 proto=TCP
                 \ TCP src=6971 dst=35692 rst seq=993492291 ack=3677952856 win=65535 len=0
[1617933663.925s] (socket::tcp): #3:*:6971: state=CLOSED=>LISTEN

I think sACK should reset the keep-alive timer, shouldn't it?

@pothos
Copy link
Contributor

pothos commented Nov 18, 2021

Didn't have a look, but are you aware of https://github.com/cbranch/smoltcp/branches/stale?
It contains some changes that are not upstreamed, (see the fixes[2] branch)
or an attempt at New Reno Congestion Control: https://github.com/cbranch/smoltcp/commits/cbranch/cc

One more hint for Congestion Control: There is Cubic and BBR in https://github.com/quinn-rs/quinn/tree/0.8.0 which could be ported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants