Geneve Inside Geneve

Disclaimer: This is by no means an optimal solution and is only being deployed to see Geneve in action!

Just like VXLAN before it, Geneve is the current flavour of overlay network encapsulation helping to improve networking in the virtual world, without worrying about the physical routers and switches that now sit in the boring underlay. It is probably best known for its use in VMware’s NSX-T, however it isn’t some secret proprietary protocol. It was in fact created by VMware and Intel, among others as a proposed RFC standard, so anyone can implement it.

Another product that makes use of Geneve to create overlay networks is Cilium, an eBPF-based Kubernetes CNI. It uses Geneve tunnels between K8s nodes to enable pod reachability, this includes both inter-pod networking and connectivity from Services to pods.

So what better way to see Geneve in action than to run Cilium inside NSX-T, running Geneve inside Geneve. In practice this means running K8s on VMs, with Cilium as the CNI, that are connected to NSX-T Segments.

App Topology

Here’s an overview of the micro-services app that’s being used:

Clients connect to the UI pods to view the web front-end, which in the background talks to the API pods to retrieve data from the database. The traffic from the UI to the API will need to go through two Geneve tunnels, as they will be hosted on different L3-separated hosts.

Ensuring the use of Overlay

First of all, Geneve is only used to tunnel traffic between hosts/nodes. So to make sure it’s used we need to ensure the VMs and pods are placed on different hosts and nodes respectively. This can be achieved by disabling DRS or creating anti-affinity rules in vCenter and by using Node Selectors in K8s.

Here’s the physical view of the app components in the infrastructure:

  • GENEVE-H01 & H02 are ESXi hosts
  • GENVE-UK8N1 and N2 are Ubuntu VM K8s nodes
  • GENEVE-K8S is an NSX-T Overlay Segment
  • Each K8s node has it’s own Cilium pod network
  • LoadBalancer type Services are provided by MetalLb in L2 mode

Kubectl confirms this diagram by showing which pod lives on which node:

And the Cilium Geneve tunnels can be listed by issuing a simple command in each of the pods:

So traffic from the UI pods need to go to another node, via another host to reach the API pods. Whereas all the underlay network needs to know is how to get from the TEP of Host1 to the TEP of Host2.

Seeing GeneveĀ²

Traffic is generated from a client browsing the UI, which talks to the API, which can then be captured from Host1 in the outbound direction (after NSX-T Geneve encapsulation):

nsxcli -c start capture interface vmnic3 direction output file geneve.pcap
  • The outermost conversation is a Geneve flow from Host1 TEP to Host2 TEP (NSX-T)
  • Then next Geneve flow is from Node1 TEP to Node2 TEP (Cilium)
  • And finally the actual application HTTP traffic from the UI Pod to the API Pod

In the real world the workload placement would probably mean no traffic would even need to go on to the wire, but it’s always good to see what would happen when it does!