There are several benefits to overlay networking, such as providing a larger number of segments to consume, for example over 16m for VXLAN and Geneve (both provide 24-bits for the VNI) or removing the reliance on physical network config to spin up a new subnet.
To make use of these benefits and more, the overlay doesn’t ask for much of the underlay, just a few basic things:
- IP connectvity between tunnel endpoints
- Any firewalls in the path allows UDP6081 for Geneve or UDP4789 for VXLAN
- Jumbo frames with an MTU size of at least 1600 bytes
- Optionally, multicast if you wish to optimise flooding
This article is about what happens when you don’t obey rule #3 above with NSX-T…
Topology
A basic 3-tier app has been deployed as above, with each tier on it’s own segment and connected to a T1 gateway. Routing between segments will be completely distributed in the kernel of each hypervisor that the workloads are on.
Problem
When connected to one of the WEB servers and attempting to access the APP, the connection was getting reset:
As the WEB and APP workloads were on separate hosts the traffic was being encapsulated into a Geneve packet (which cannot be fragmented) and sent over the transport network from TEP to TEP:
A ping test confirmed that connectivity was ok up to 1414 bytes, anything larger was being dropped:
root@NWATCH-WEB01 [ ~ ]# ping 10.250.70.1 -s 1414
PING 10.250.70.1 (10.250.70.1) 1414(1442) bytes of data.
1422 bytes from 10.250.70.1: icmp_seq=1 ttl=61 time=1.33 ms
^C
— 10.250.70.1 ping statistics —
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.326/1.326/1.326/0.000 ms
root@NWATCH-WEB01 [ ~ ]# ping 10.250.70.1 -s 1415
PING 10.250.70.1 (10.250.70.1) 1415(1443) bytes of data.
^C
— 10.250.70.1 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms
To prove the app layer was ok, a small page returning just one word was tested and worked fine:
Solved
As soon as the MTU size was increased on the underlay transport network to 1600 bytes the full webpage loaded fine:
And a ping for good measure:
root@NWATCH-WEB01 [ ~ ]# ping 10.250.70.1 -s 1415
PING 10.250.70.1 (10.250.70.1) 1415(1443) bytes of data.
1423 bytes from 10.250.70.1: icmp_seq=1 ttl=61 time=1.76 ms
1423 bytes from 10.250.70.1: icmp_seq=2 ttl=61 time=1.69 ms
^C
— 10.250.70.1 ping statistics —
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 1.686/1.723/1.760/0.037 ms