[!NOTE] This document is in the process of being restructured into general and per-service specifications.
- First Target Scenario: SKU for Networked Virtual Appliance (NVA)
- Scale per DPU (Card)
- Scenario Milestone and Scoping
- Virtual Port (or Elastic Network Interface / ENI) and Packet Direction
- First Target Scenario: SKU for Networked Virtual Appliance (NVA)
- Scale per DPU (Card)
- Scenario Milestone and Scoping
- Virtual Port (aka Elastic Network Interface / ENI) and Packet Direction
- Routing (Routes and Route-Action)
- First Target Scenario: SKU for Networked Virtual Appliance (NVA)
- Scale per DPU (Card)
- Scenario Milestone and Scoping
- Virtual Port (aka Elastic Network Interface / ENI) and Packet Direction
- Routing (Routes and Route-Action)
- Packet Flow
- Packet transforms
- Packet Transform Examples
- Metering
- VNET Encryption
- Telemetry
- Counters
- BGP
- Watchdogs
- Servicing
- Debugging
- Flow Replication
- Unit Testing and development
- Internal Partner dependencies
Highly Optimized Path, Dedicated Appliance, Little Processing or Encap to SDN Appliance and Policies on an SDN Appliance Why do we need this scenario? There is a huge cost associated with establishing the first connection (and the CPS that can be established)
- A high Connections per Second (CPS) / Flow SKU for Networked Virtual Appliances (NVA)
*Note: Below are the expected numbers per Data Processing Unit (DPU); this applies to both IPV4 and IPV6 underlay and overlay
*IPV6 numbers will be lower
An SDN appliance in a multi-tenant network appliance (meaning 1 SDN appliance will have multiple cards; 1 card will have multiple machines or bare-metal servers), which supports Virtual Ports. These can map to policy buckets corresponding to customer workloads, for example: Virtual Machines or Bare Metal servers servers.
The Elastic Network Interface (ENI), is an independent entity that has a collection of routing policies. Usually there is a 1:1 mapping between the VM NIC (Physical NIC) and the ENI (Virtual NIC). The ENI has specific match identification criteria, which is used to identify packet direction. The current version only supports mac-address as ENI identification criteria.
Once a packet arrives Inbound to the target (DPU), it must be forwarded to the correct ENI policy processing pipeline. This ENI selection is done based on the inner destination MAC of the packet, which is matched against the MAC of the ENI.
The SDN controller will create these virtual ports / ENIs on an SDN appliance and associate corresponding SDN policies such as – Route, ACL, NAT etc. to these virtual ports. In other words, our software will communicate with the cards, hold card inventory and SDN placement, call APIs that are exposed through the card: create policies, setup ENI, routes, ACLs, NAT, and different rules.
The following applies:
- Each Virtual Port (ENI) will be created with an ENI identifier such as – Mac address, VNI or more.
- A Virtual Port also has attributes such as : flow time-out, QOS, port properties related to the port.
- The Virtual Port is the container which holds all policies.
For more information, see SDN pipeline basic elements.
Routing must be based on the Longest Prefix Match (LPM) and must support all underlay and overlay combinations described below:
- inner IPv4 packet encapsulated in outer IPv4 packet
- inner IPv4 packet encapsulated in outer IPv6 packet
- inner IPv6 packet encapsulated in outer IPv4 packet
- inner IPv6 packet encapsulated in outer IPv6 packet
The routing pipeline must support the routing models shown below.
- Transpositions
- Direct traffic – pass thru with static SNAT/DNAT (IP, IP+Port
- Packet upcasting (IPv4 -> IPv6 packet transformation)
- Packet downcasting (IPv6 -> IPv4 packet transformation)
- Encap
- VXLAN/GRE encap – static rule
- VXLAN/GRE encap – based on mapping lookup
- VXLAN/GRE encap – calculated based on part of SRC/DEST IP of inner packet
- Up to 3 levels of routing transforms (example: transpose + encap + encap)
- Decap
- VXLAN/GRE decap – static rule
- VXLAN/GRE decap – based on mapping lookup
- VXLAN/GRE decap – inner packet SRC/DEST IP calculated based on part of outer packet SRC/DEST IP
- Transpositions
- Direct traffic – pass thru with static SNAT/DNAT (IP, IP+Port)
- Packet upcasting (IPv4 -> IPv6 packet transformation)
- Packet downcasting (IPv6 -> IPv4 packet transformation)
- Up to 3 level of routing transforms (example: decap + decap + transpose)
All routing rules must optionally allow for stamping the source MAC (to
enforce Source MAC correctness), correct/fix/override source mac
.
- Matching is based on destination IP only - using the Longest Prefix Match (LPM) algorithm.
- Once the rule is matched, the correct set of transposition, encap steps must be applied depending on the rule.
- Only one rule will be matched.
All inbound rules are matched based on the priority order (with lower priority value rule matched first). Matching is based on multiple fields (or must match if field is populated). The supported fields are:
- Most Outer Source IP Prefix
- Most Outer Destination IP Prefix
- VXLAN/GRE key
Once the rule is matched, the correct set of decap, transposition steps must be applied depending on the rule. Only one rule will be matched.
Also notice the following:
- Routes are usually LPM based Outbound
- Each route entry will have a prefix, and separate action entry
- The lookup table is per ENI, but could be Global, or multiple Global lookup tables per ENIs
- Outer Encap IPv4 using permits routing between servers within a Region; across the Region we use IPv6
Why would we want to use these?
- Example: to block prefixes to internal DataCenter IP addresses, but Customer uses prefixes inside of their own VNET
- Example: Lookup between CA (inside Cx own VNET) and PA (Provider Address) using lookup table (overwrite destination IP and MAC before encap)
- Example: Customer sends IPv4, we encap with IPv6
- Example: ExpressRoute with 2 different PAs specified (load balancing across multiple PAs) using 5 tuples of packet to choose 1st PA or 2nd PA
Route Type | Example |
---|---|
Encap_with_lookup_V4_underlay | Encap action is executed based on lookup into the mapping table.V4 underlay is used |
Encap_with_lookup_V6_underlay | Encap action is executed based on lookup into the mapping table.V6 underlay is used |
Encap_with_Provided_data (PA) | Encap action is executed based on provided data.Multiple PA can be provided. |
Outbound NAT (SNAT)_L3 | L3 NAT action is executed on source IP, based on provided data. |
Outbound NAT (SNAT)_L4 | L4 NAT action is executed on source IP, source port based on provided data. |
Null | Blocks the traffic |
Private Link | - |
Mapping Table for a v-port
Customer Address | Physical Address - V4 | Physical Address - V6 | Mac-Address for D-Mac Rewrite | VNI to Use |
---|---|---|---|---|
10.0.0.1 | 100.0.0.1 | 3ffe::1 | E4-A7-A0-99-0E-17 | 10001 |
10.0.0.2 | 100.0.0.2 | 3ffe::2 | E4-A7-A0-99-0E-18 | 10001 |
10.0.0.3 | 100.0.0.3 | 3ffe::3 | E4-A7-A0-99-0E-19 | 20001 |
10.0.0.4 | 100.0.0.4 | 3ffe::3 | E4-A7-A0-99-0E-20 | 10001 |
Route Table for a v-port
- LPM decides which route is matched.
- Once the route is matched, a corresponding action is executed.
Route example- Outbound packets
For the first packet of a TCP flow, we take the Slow Path, running the transposition engine and matching at each layer. For subsequent packets, we take the Fast Path, matching a unified flow via UFID and applying a transposition directly against rules.
For more information, see SDN pipeline basic elements.
See packet transforms in SDN pipeline basic elements.
V-Port
-
Physical address = 100.0.0.2
-
V-Port Mac = V-PORT_MAC
VNET Definition:
-
10.0.0.0/24
-
20.0.0.0/24
VNET Mapping Table | | V4 underlay| V6 underlay| Mac-Address| Mapping Action| VNI | |:----------|:----------|:----------|:----------|:----------|:---------- | |10.0.0.1| 100.0.0.1| 3ffe :: 1| Mac1| VXLAN_ENCAP_WITH_DMAC_DE-WRITE| 100 | |10.0.0.2| 100.0.0.2| 3ffe :: 2| Mac2| VXLAN_ENCAP_WITH_DMAC_DE-WRITE| 200 | |10.0.0.3| 100.0.0.3| 3ffe :: 3| Mac3| VXLAN_ENCAP_WITH_DMAC_DE-WRITE| 300 | | | | | | | |
Packet Transforms
- Metering will be based on per flow stats, metering engine will consume per flow stats of bytes-in and bytes-out.
Counters are objects for counting data per ENI. The following are their main characteristics:
- A counter is associated with only one ENI that is, it is not shared among different ENIs.
- If you define a counter as a global object, it cannot reference different ENIs.
- The counters live as long as the related ENI exists.
- The counters persist after the flow is completed.
- You use API calls to handle these counters.
- When creating a route table, you will be able to reference the counters.
The control plane is the consumer of counters that are defined in the data plane. The control plane queries every 10 seconds.
Counters can be assigned on the route rule, or assigned onto a mapping. If the mapping does not exist, you revert to the route rule counter. A complete definition will follow when we have more information other than software defined devices.
In the flow table we list the packet counter called a 'metering' packet; once we have the final implementation that does the packet processing, we can do metering.
Essentially, whenever a route table is accessed and we identify the right VNET target (based on the mapping from the underlay IP), will have an ID of the metering packet preprogrammed earlier. We will reference this counter in the mappings. When the flow is created it will list this counter ID. When the packet transits inbound or outbound through the specific flow, this counter is incremented and tracked separately for the inbound and outbound.
Some specific counters (such as memory use of a card, etc...) are global, however most of the counters should be per ENI as processing of rules and drops, accepts, list of flows etc are per ENI.
We need more information around Counters, Statistics, and we need to start thinking about how to add Metering- and reconcile this in the P4 model.
Counter Name | Description | ENI or Global |
---|---|---|
TotalPacket | Total packets to/from a VM. Exposed to customer; 2 counters, 1 per direction | ENI |
TotalBytes | Total bytes to/from a VM. Exposed to customer; 2 counters, 1 per direction | ENI |
TotalUnicastPacketForwarded | ENI | |
TotalMulticastPacketsForwarded | ENI | |
TcpConnectionsResetHalfTTL | TCP connections that had a TCP reset and its TTL cut down to 5 seconds | ENI |
NonSynStateful | Non-SYN TCP packets that are natted and not dropped by setting (SLB scenario) | ENI |
NumberOfFlowResimulated DuringPortTimer | Number of connections updated in an internal port-level update | ENI |
RedirectRuleResimulatedUf | Number of times a redirect packet impacted a connection | ENI |
DropPacket | Number of packets dropped on a port | ENI |
DropBroadcastPacket | Number of broadcast packets dropped by guard | ENI |
DropInvalidPacket | Number of packets dropped due to being unable to extract valid information from it | ENI |
DropIPv4SpoofingPacket | Number of packets not using the programmed source address | ENI |
DropIPv6SpoofingPacket | Number of packets not using the programmed source address | ENI |
DropBlockedPacket | Number of packets dropped due to the port in a blocked state | ENI |
TcpConnectionsResetByInjected Reset | Number of TCP connections reset with an injected reset | ENI |
DroppedRedirectPackets | Number of redirect packets saw and dropped(All redirect packets are dropped by design) | ENI |
DroppedPADiscoveryPackets | Number of "PA Discovery" packets dropped intentionally as part of VNET encryption | ENI |
DroppedResourcesMemory | Number of packets dropped due to unable to allocate memory | ENI |
DroppedPARouteRule | Number of packets dropped due to PA route rule failure to determine outer mac address to use | ENI |
DroppedFragPacket | Number of fragments dropped due to fragmention cache collision or unable to apply transposition | ENI |
DroppedResourcesPacket | Number of packets dropped due to a lack of some object or memory | ENI |
DroppedAclPacket | Packets dropped due to matching a block rule | ENI |
DroppedMalformedPacket | Number of packets dropped due to determining them to be malformed | ENI |
DroppedForwardingPacket | Number of packets unable to be forwarded to it's next destination | ENI |
DroppedNoRuleMatchPacket | Number of packets dropped because the networking device did not find the matching action | ENI |
DroppedMonitoringPingPacket | Number of pingmesh packets dropped by design | ENI |
DroppedResourcesUnifiedFlow MaxFlowsLimit | Number of packets dropped due to reaching the UF limit and being unable to create any more | ENI |
TcpSynPacket | Number of TCP Syn packets seen | ENI |
TcpSynAckPacket | Number of TCP SynAck packets seen | ENI |
FINPackets | Number of FIN packets seen | ENI |
RSTPackets | Number of RST packets seen | ENI |
TransientFlowTimeouts | Number of connections deleted after resimulation | ENI |
TcpConnectionsVerified | Number of TCP connections that completed their syn handshake | ENI |
TcpConnectionsTimedOut | Number of TCP connections that timed out their full TTL(Syn handshake finished, but Fin handshake didn't start) | ENI |
TcpConnectionsReset | Number of TCP connections that received a reset | ENI |
TcpConnectionsResetBySyn | Number of TCP connections that got destroyed and recreated by a SYN on the same tuples | ENI |
TcpConnectionsClosedByFin | ENI | |
TcpHalfOpenTimeouts | ENI | |
TcpConnectionsTimeWait | Number of TCP connections that timed out in the time wait state | ENI |
CurrentTotalFlowEntry | Current number of unified flows (aka connections) | ENI |
CurrentTotalFlow | Current number of main unified flows(Side of the connection that initiated the connection) | ENI |
CurrentHalfOpenFlow | Current number of Ufs in a half open state | ENI |
CurrentTcpFlow | Current number of Ufs that are for a TCP connection | ENI |
CurrentUdpFlow | Current number of Ufs for UDP | ENI |
CurrentOtherFlow | Current number of Ufs for something other than TCP or UDP | ENI |
MaxTotalFlowEntry | Maximum number of Ufs since the initialization of the port | ENI & Global |
MaxHalfOpenFlow | Maximum number of Ufs in a half-open state since the initialization of the port | ENI |
MaxTcpFlow | follow above | ENI |
MaxUdpFlow | follow above | ENI |
MaxOtherFlow | follow above | ENI |
CreatedTotalFlowEntry | Total number of Ufs created | ENI |
CreatedHalfOpenFlow | Total number of flows in a half open state | ENI |
CreatedTcpFlow | follow above | ENI |
CreatedUdpFlow | follow above | ENI |
CreatedOtherFlow | follow above | ENI |
MatchedTotalFlowEntry | Total number of times a UF was matched and used | ENI |
MatchedHalfOpenFlow | follow above | ENI |
MatchedTcpFlow | follow above | ENI |
MatchedUdpFlow | follow above | ENI |
MatchedOtherFlow | follow above | ENI |
CreationRateMaxTotalFlowEntry | Maximum creation rate for Unified flows in a second | ENI |
CreationRateMaxHalfOpenFlow | follow above | ENI |
CreationRateMaxTcpFlow | follow above | ENI |
CreationRateMaxUdpFlow | follow above | ENI |
CreationRateMaxOtherFlow | follow above | ENI |
No ENI Match | evident | ENI |
CPS Counters | ENI & Global |
Questions
- How often will we read?
- What type of API to use?
- Will we push or pull from the Controller?
Border Gateway Protocol (BGP) is a standardized exterior gateway protocol designed to exchange routing and reachability information among autonomous systems on the Internet. BGP is classified as a path-vector routing protocol and it makes routing decisions based on paths, network policies, or rule-sets configured by a network administrator. For more information, see Border Gateway Protocol.
Counters per rule to trace an increment per layer, ACL hits, Packet Captures, Bandwidth Metering for Routing Rules to count bytes (each flow associated with a bandwidth counter when an LPM is hit - many flows may share the same counters).
For information about flow replication, see DASH High-Availability.
- Need ability to run rule processing behavior on dev box / as part of merge validation.
-
SLB VXLAN support
-
Reduced tuple support on host.