-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-Node Firewall - LibreQoS and ECMP #458
Comments
LibreQos (and most other shaping solutions) can't share state between two servers - so to use two servers, you need to be using some routing rules (note: LibreQoS is not a router, it's just a transparent bridge) to ensure that any given client will always prefer the same route (and fail to the other one if it is down). The reason for this is that traffic is so dynamic - by the time one server would finish telling the other "Joe is using 100 Mbps right now", that may not be true anymore. It literally changes microsecond to microsecond, and there's just no practical way to sync data that fast. Out of the box setup doesn't do bonded interfaces. (You can't bind XDP to a bonded device, it only binds to a physical device - that's a kernel limitation). What you can do is to put
I'm not sure how that'll perform, it seems like it'd need some pretty hefty CPU. |
I am not sure if we are talking past each other or not. Here are some scenarios where LibreQos is useful in a DC environment. Say you want really low TCP latency with conventional congestion controls. LibreQos adds about 100-200us on the path, but you can enable ECN and set libreqos to a target of about cake rtt 5ms to get to a typical instream latency of under 500us (this is at least 10x better than what you will get through a typical switch) and still retain full throughput without packet loss or retries. I have tried values as low as 2ms actually - and it is REALLY hard to measure down below a ms. but I was pretty sure I was getting close to full throughput, no packetet loss at about 200us delay. This is for servers talking to each other within a datacenter. I know of someone fooling with AI workloads in this way. Google has long configured ecn support into their fq_codel instance for both RFC3168 and L4S(DCTCP) style ECN. Another place where ECMP (how do you calculate the cost?) is indeed valuable through libreqos boxes is to load balance flows when you are pushing that amount of data through it, and also be able to analyze the results. But unless you are trying to step down individual flows to another rate along the way, (e.g. ecn), just spattering packets through the switch saves that 100us. It can also be configured as a hot spare so if you lose one it falls over to the other, but there would be a blip of oh, 3 seconds, before it starts to recover, as, per herbert above, copying queue state over live is measured in msec, as is (best case) a BFD failover. We mostly just recommend routing around one box, but if you have two, and want a hot spare, goferit. |
@thebracket - we could get closer to a fast failover than we do, just running off a mirrored port and sinkholing the output, and switching over immediately if the other port goes down... but would still lose a significant number of packets during the switchover. /me puts his feet up on his old tandem box... |
That would require that we had some knowledge of the status of the other box? |
yes. gotta lie about your mac address too. https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol |
Thank you for sharing your topology. LibreQos is presently optimized for a standalone eBPF bridge running on bare metal, where we easily achieve 25Gbits for 10,000 subscribers in an ISP environment at 50% of cpu across 16 Xeon cores. Many of our users run under VMs also. While we have produced versions internally of non-ebpf code, the performance hit prior to linux 6.1 on most platforms was pretty extreme, as was leveraging veth. We get our performance by routing a subscribe to a core, which further emulates the underlying topology past the QoS point. Both of these techniques are a bit incompatible with conventional firewalling in linux today. How are you doing firewalling? We do not route, either. It's just a pass-through bridge, so in our preferred configuration there would be a separate box entirely behind each of your firewalls, and you would lose the ability to precisely control the per subscriber bandwidth. To try and answer your questions more fully perhaps a videoconference would help. We also take consulting dollars. There is plenty of demand for fully integrated solutions like this, but the eBPF dependency gets in the way. To attempt to answer your questions: Could you inform us on the following questions:
We can push 25Gbit easily on a separate $1500 box. Depending on you shaping needs, you might merely be able to apply cake standalone on your queues on your boxes without the need for anything else - not even shaped, just responding to BQL backpressure. Have you tried that? Cake can push about 10Gbit/core/tx ring in that case.
Plenty!
Do you need per subscriber shaping? If not, just cake by itself is enough.
It only takes an hour to get an instance up and running. Stick one in behind all that stuff. The 64 core epyc boxes we have been playing with can crack 60Gbits of shaping. |
Some questions:
If I'm reading the diagram correctly, your current setup passing through a box is roughly:
Now, LibreQoS is a pure, transparent bridge and doesn't work if you have IP addresses set on either of its interfaces. It also cannot bind to So to make this work, you have to not have your IP address, BGP session and NFTables rules on an interface that runs LibreQoS - and you need to not have LibreQoS on the bond. So if you're not using VMs, you wind up needing: bond0.254 -> br_internal -> veth_internal (LibreQoS) -> veth_external -> br_external You can create your veth pair and have LibreQoS mount that. Then bridge the "client facing/southbound" network between the bond and veth_internal - and put your IPs, BGP, etc. on br_external so there's an endpoint to do firewall and routing. Then you'll want some BGP weights to encourage the same client to go through the same firewall each time (which you need anyway or your firewall state tables will never be right). |
Some questions:
Fairly simple data-center with some heavy weight customers hungry for bandwidth in a all-you-can eat buffet.
Ubuntu Jammy on Lenovo SR650 V3 chassis w/ 2x Intel(R) Xeon(R) Gold 6448Y, 1TB RAM, 2x 25Gbps dual ConnectX-6 LX, couple ssds for boot and a few NVMes for local storage.
Currently on bare-metal, still evaluating VMs with SR-IOV. No containers.
Currently 40Gbps in a couple small datacenters but plans to run multiple 100Gbps links in the horizon.
/// Me and my team have some good years of experience with Linux, perhaps it's fairly safe to say you can talk to us freely. We'll try to catch up.
In our ECMP lab setup we have the following: 172.31.255.0/24 nhid 59 proto bgp metric 20 172.31.252.0/23 nhid 56 proto bgp metric 20 172.31.254.0/24 nhid 62 proto bgp metric 20 Traffic arrives from customer (1) ens3f1np1.850 or ens3f0np0.851 } or from customer (2) ens3f1np1.854 or ens3f0np0.855 then goes to outside (3) via ens5f1np1 or ens5f0np0. Since we have pair of devices configured exactly same way (same ASN, different neighbors) with ECMP we have traffic arriving from one box and returning from another since traffic is assymetric. We're using NTFTables with very simple stateless configuration. We're not using conntrack or doing NAT. |
Greetings LibreQoS team!
In our lab we have a couple servers with 4x 25Gbps ports each (dual port nics) with ECMP configured in order to load balance traffic between the servers. I wonder if it's possible to use two interfaces for egress and the other two interfaces for ingress. I'm also scratching my head on how to deal with policy setup in this scenario where each server is independent and actively routing traffic.
Please advise.
The text was updated successfully, but these errors were encountered: