One-Arm VLAN-Backed Load Balancer in NSX-T

There are two main deployment methods for Load Balancers in NSX-T, both offer the same features so the differences are just in connectivity:

  • Inline/Transparent – The load balancer sits in the middle of the network between the clients and servers, so one interface receives the client traffic, another forwards to the servers .The source IP remains the client’s original IP, so the back end servers must have a route back, which should go via the load balancer. See the implementation here
  • One-Arm – Uses the same interface to receive the client traffic as it does to reach the back end servers. This method requires the use of SNAT to modify the client IP to that of the load balancer. The back end servers only need to know how to get back to the load balancer, not the original clients.

Using a VLAN-backed Segment will allow the load balancer to direct traffic to both VM and bare metal workloads that may live on the same broadcast domain.

There are two supported connectivity methods for the ‘arm’ in the One Arm Load Balancer in NSX-T. One uses a T1 Uplink Interface to connect to a T0, but unlike Inline, uses the same interface to reach the back end servers. The other, which is the focus of this post, uses a T1 Service Interface to provide connectivity:

Setup the Network

First create a VLAN Segment, which will be used to connect VMs to be load balanced. It should be connected to a Transport Zone that is applied to the hosts that will run the VMs and also the Edge Custer that will be used to host the T1 Gateway that will be created next:

Then create a new T1 Gateway. It should be assigned to an Edge Cluster in order to run the centralised load balancer service, but doesn’t need to be connected to a T0:

Then create a Service Interface on the T1 that connects to the VLAN Segment created at the start:

Then create a static route on the T1 to reach the external networks. The next hop could be another T1 (that connects to a T0) or the physical network, depending on your setup:

Configure the Load Balancer

Create a Load Balancer in Load Balancing > Load Balancers > Add Load Balancer and attach it to the T1 Gateway:

Create a Server Pool to specify which back-end servers are to be load balanced. Note that the SNAT Translation Mode is set to Automap, which means the Source IP will be that of the T1 rather than the original client IP. :

The Members/Group contains the criteria to match the back-end servers, this could be using Tags, VM Names, or for non-NSX-aware workloads such as bare metal IP addresses:

This is also wher eyou can set a back end port, if this is different to your front end. E.g. if the app runs on port 8000, but we want the clients to go to 80, set 8000 for the members and set 80 later in the Virtual Server:

Then create a Virtual Server, which acts as the entry point to the Load Balancer, setting the virtual IP/port/protocol. It needs to reference the Pool that’s just been created and is attached to the Load Balancer:

At this point the topology from NSX looks like this:

Checking the Traffic

Now the Load Balancer should be switching between the two pool members, using the Round Robin method:

Checking the statistics shows traffic going to both the VM workloads and the bare metal, which was specified by IP:

Checking the web server logs shows that the client source address is the T1 Service Interface address:

The return traffic then goes via the T1 before returning to the client.

Geneve Inside Geneve

Disclaimer: This is by no means an optimal solution and is only being deployed to see Geneve in action!

Just like VXLAN before it, Geneve is the current flavour of overlay network encapsulation helping to improve networking in the virtual world, without worrying about the physical routers and switches that now sit in the boring underlay. It is probably best known for its use in VMware’s NSX-T, however it isn’t some secret proprietary protocol. It was in fact created by VMware and Intel, among others as a proposed RFC standard, so anyone can implement it.

Another product that makes use of Geneve to create overlay networks is Cilium, an eBPF-based Kubernetes CNI. It uses Geneve tunnels between K8s nodes to enable pod reachability, this includes both inter-pod networking and connectivity from Services to pods.

So what better way to see Geneve in action than to run Cilium inside NSX-T, running Geneve inside Geneve. In practice this means running K8s on VMs, with Cilium as the CNI, that are connected to NSX-T Segments.

App Topology

Here’s an overview of the micro-services app that’s being used:

Clients connect to the UI pods to view the web front-end, which in the background talks to the API pods to retrieve data from the database. The traffic from the UI to the API will need to go through two Geneve tunnels, as they will be hosted on different L3-separated hosts.

Ensuring the use of Overlay

First of all, Geneve is only used to tunnel traffic between hosts/nodes. So to make sure it’s used we need to ensure the VMs and pods are placed on different hosts and nodes respectively. This can be achieved by disabling DRS or creating anti-affinity rules in vCenter and by using Node Selectors in K8s.

Here’s the physical view of the app components in the infrastructure:

  • GENEVE-H01 & H02 are ESXi hosts
  • GENVE-UK8N1 and N2 are Ubuntu VM K8s nodes
  • GENEVE-K8S is an NSX-T Overlay Segment
  • Each K8s node has it’s own Cilium pod network
  • LoadBalancer type Services are provided by MetalLb in L2 mode

Kubectl confirms this diagram by showing which pod lives on which node:

And the Cilium Geneve tunnels can be listed by issuing a simple command in each of the pods:

So traffic from the UI pods need to go to another node, via another host to reach the API pods. Whereas all the underlay network needs to know is how to get from the TEP of Host1 to the TEP of Host2.

Seeing GeneveĀ²

Traffic is generated from a client browsing the UI, which talks to the API, which can then be captured from Host1 in the outbound direction (after NSX-T Geneve encapsulation):

nsxcli -c start capture interface vmnic3 direction output file geneve.pcap
  • The outermost conversation is a Geneve flow from Host1 TEP to Host2 TEP (NSX-T)
  • Then next Geneve flow is from Node1 TEP to Node2 TEP (Cilium)
  • And finally the actual application HTTP traffic from the UI Pod to the API Pod

In the real world the workload placement would probably mean no traffic would even need to go on to the wire, but it’s always good to see what would happen when it does!

Configuring OSPF in NSX-T

The release of NSX-T 3.1.1 brings with it, among other things, OSPFv2 routing! This may be a welcome return to many enterpirses who haven’t adopted BGP in the datacentre and allows for a smoother transition from NSX-V, which has always had OSPF support.

Unlike with NSX-V, where routing was used within the virtual domain (between ESGs and DLRs), NSX-T only uses dynamic routing protocols for external P-V connectivity. So OSPF is enabled on T0 Gateways, connecting to the physical world.

Once you have a T0 Gateway deployed, enabling OSPF is now just a few clicks away…

Here’s the logical topology that will be configured in the rest of this post:

Configuring OSPF

1. Assign each interface for each VLAN to the relevant Edge Node:

2. Disable the BGP toggle and enable the OSPF toggle:

3. Create an Area definition (only a single Area can be created). The type can be either Normal or NSSA and the authentication method can be one of the usual for OSPF; Type 0 (None), Type 1 (Plaintext) or Type 2 (MD5):

4. Configure each interface to be used in the OSPF process. Each interface and area can be handily selected from the dropdown, then set the Enabled toggle. Interfaces can be of type broadcast (DR/BDR) or P2P:

5. To advertise out overlay segments to the physical world, create a route redistribution policy, selecting the types of routes to advertise e.g. T1 connected and select OSPF as the Destination Protocol:

6. Then don’t forgot to enable the newly created OSPF route redistribution:

As long as the physical network has also been setup there should now be a few neighbours forming, which can be viewed directly in the NSX UI:

A neighbourship will form between the two T0 SR instances, but will be in the 2Way/DROther state, as DR and BDR are in use, all others are to the physical net

And a similar view from the physical world:

Finally, here’s some routes of Segments that are connected to a T1, “advertised” to the T0, then redistributed out to the physical network using ECMP OSPF from the T0 SRs: