vSphere with Kubernetes: Working with Embedded Harbor

When you’ve enabled Workload Management on your Supervisor Cluster you’ll want to start spinning up some containers! You can of course use a pulic container registry like Docker Hub, but vSphere w/ K8s provides a convinient one-click private registry called Embedded Harbor, to use your private images.

Harbor was included in VMware’s first container platform, VIC, so it’s good to see its development continue here. As this is an embedded version it doesn’t have all the features of a standalone deployment, but is enough to do the basics.

Enabling Harbor

Enabling the registry is one of the smoothest parts of vSphere w/ K8s. Simply got to the Supervisor Cluster > Configure > Image Registry and click Enable Harbor that’s it!

After a few minutes you’ll your first Pods being deployed to a new ‘vmware-system-registry’ Namespace:

And the Health of the Image Registry should change to Running. There will also be a link to the Harbor UI, which will be an IP in your Ingress range used to setup K8s. Another important link here is the Root certificate, which you should download now.

Using The Harbor Registry

As the Harbor registry is nicely integrated into vCenter, everytime you create a new Namespace a new project is created in Harbor. Also, logging into Harbor is controlled with vSphere SSO. Here I’ve created a Namespace in the vSphere Client called ‘netwatch’, logged into Harbor from the link in the Image Regisrtry and the project has been automatically  created:

To get images into the registry you can use Docker. As Harbor is using a self-signed cert you’ll get an error if you try to login straight away. There’s 2 options here:

  1. The secure method is to install the Harbor root certificate into your local machine you’ll be using Docker from. The install location may depend on your OS, but on Ubuntu it’s in /etc/docker/certs.d/. The cert can obtained from the Image Registry page in vCenter or within a Harbor Project:
  2. Alternatively and purely for testing purposes you can modify your docker daemon.json file to allow an insecure registry, then restart docker:

Pushing Your Images

Now to get your images into the registry! Login to Harbor with docker:

docker login YOUR.HARBOR.IP.ADDRESS
Username: administrator@vsphere.local
Password:
Login Succeeded
Then tag and push your images with the following format:
docker tag PROJECT/IMAGE:TAG YOUR.HARBOR.IP.ADDRESS/PROJECT/IMAGE:TAG
docker push YOUR.HARBOR.IP.ADDRESS/PROJECT/IMAGE:TAG
The image will then be in your Harbor repo:

Using Your Images

To consume your new private images in Pods you’ll need to provide the full path to the image in your YAML or a quick and dirty deployment example:

kubectl create deployment quickdeploy --image=YOUR.HARBOR.IP.ADDRESS/netwatch/netwatch-api:1.0

And here’s the quickdeploy Pod along with a few others up and running:

NSX-T Policy API Single JSON PATCH

The NSX-T Policy API is a powerful concept that was introduced in 2.4 and powers the new Simplified UI. It provides a declarative method of describing, implementing and deleting your entire virtual network and security estate.

In a single API call you deploy a complete logical topology utilising all the features NSX-T provides including T1 Gateways (with NAT/Load Balancer services) for distributed routing, Segments for streched broadcast domains and DFW rules to enforce microsegmentation.

This example performs the following:

  • Creates a T1 Gateway and connects it to the existing T0
  • Creates three Segments and attaches them to the new T1
  • Creates intra-app distributed firewall rules to only allow the necessary communication between tiers
  • Creates Gateway Firewall rules to allow external access directly to the web tier
  • Creates a Load Balancer for the web tier with TLS-offloading using a valid certificate

And once deployed the topology will look like this:

Currently, on the networking side there is only a T0 Gateway, a single Segment (which is a VLAN-backed transit Segment to the physical network), with no T1s or Load Balancers:

And on the security side there’s no DFW policies :

Once the JSON body (see my example here) is created with the relevant T0, Edge Cluster and Transport Zone IDs inserted, then the REST call can be constructed. Using your favourite REST API client e.g. Curl, Postman, Requests (Python), the request should look like this:

URL: https://NSX-T_MANAGER/policy/api/v1/infra/
Method: PATCH
Header: Content-Type: application/json
Auth: Basic (NSX-T Admin User/Password)
Body: The provided JSON

Once you send a successful request you’ll notice you receive a status 200 almost instantly, but don’t be fooled into thinking that your entire topology has now been created!

In reality this is just the policy engine acknowledging your declarative intent. It now works to convert or ‘realise’ that intent in to imperative tasks that are used to create all of the required logical objects.

Once this has all been created you’ll see your network and security components in the GUI:

Now the magic of using this API means that you can also delete your entire topology with the same call, just changing the marked_for_delete to true for each section.

Example code here: https://github.com/certanet/nsx-t-policy-api

Configuring NSX-T 2.5 L2VPN Part 2 – Autonomous Edge Client

Continuing on from the server configuration in Part 1, this is the NSX-T L2VPN Client setup.

There’s a few options to terminate an L2VPN in NSX-T, but all of them are proprietary VMware, so there’s no vendor inter-op here. This article uses the ‘Autonomous Edge’ option, previously called Standalone Edge in NSX-V, which is essentially  a stripped-down Edge VM that can be deployed to a non-NSX prepped host. Confusingly this isn’t the new type of Edge Node used in NSX-T, but instead is the same ESG that was used in V.

The Topology

The Overlay Segments on the left are in the NSX site that’s hosting the L2VPN server. On the right is a single host in a non-NSX environment that will use the Autonomous Edge to connect the VLAN-backed Client-User VM to the same subnet as SEG-WEB.

OVF Deployment

First the Autonomous Edge OVF needs to be deployed in vCenter:

The first options are to set the Trunk and Public (and optionally HA) networks. The Trunk interface should connect to a (shock) trunked portgroup (VL4095 in vSphere) and the Public interface should connect to the network that will be L3 reachable by the L2VPN server (specifically the IPSec Local Endpoint IP). Also, as this is an L2VPN there will need to be a loosening of security settings to allow unknown (to vCenter) MACs to be allowed. If the portgroups are in a VDS then a sink port should be used for the Trunk, alternatively if on a standard portgroup; Forged Transmits and Promiscuous Mode are required.

Next, set up the passwords and Public interface network settings. Then in the L2T section set the Peer Address to the L2VPN Local Endpoint IP and copy the Peer Code from the server setup into the Peer Code field:

The last step in the OVF is to set the sub-interface to map a VLAN (on the local host) to the Tunnel ID (that was set on the Segment in the server setup). Here VLAN80 will map to Tunnel ID80 which mapped to the SEG-WEB:

The Tunnel

Once the OVF is deployed and powered on, either the L2VPN Client or Server can initiate an IKE connection to the other to setup the IPSec tunnel. Once this is established then a GRE tunnel will be setup and the L2 traffic will be tunnelled inside ESP on the wire. There’s a few options to view the status of the VPN:

Client tunnel status:

Server tunnel status:

GUI tunnel stats:

Connecting a Remote VM

Now that the VPN is up a VM can be placed on VLAN 80 at the remote site and be part of the same broadcast domain as the SEG-WEB Overlay Segment. Here the NSXT25-L2VPN-Client-User VM is placed in a VLAN80 portgroup, which matches what was set in the OVF deployment. NOTE there isn’t even a physical uplink in the networking here (although this isn’t a requirement) so traffic is clearly going via the L2VPN Client:

Now set an IP on the new remote VM in the same subnet as the SEG-WEB:

And load up a website hosted on the Overlay and voila!

NSX-T 2.5 One-Arm Load Balancer

Load balancing in NSX-T isn’t functionally much different to NSX-V and the terminology is all the same too. So just another new UI and API to tackle…

As load balancing is a stateful service, it will require an SR within an Edge Node to provide the centralised service. It’s ideal to keep the T0 gateway connecting to the physical infrastructure as an Active-Active ECMP path, so this LB will be deployed to a T1 router.

The Objective

The plan is to implement a load balancer to provide both highly available web and app tiers. TLS Offloading will also be used to save processing on the web servers and provide an easy single point of certificate management.

  1. User browses to NWATCH-WEB-VIP address
  2. The virtual server NWATCH-WWW-VIP is invoked and the request is load balanced to a NWATCH-WEB-POOL member
  3. The selected web server needs access to the app-layer servers, so references the IP of NWATCH-APP-VIP
  4. The NWATCH-APP-VIP virtual server forwards the request onto a pool member in NWATCH-APP-POOL
  5. The app server then contacts the PostgreSQL instance on the NWATCH-DB01 server and the user has a working app!

Configuration

First the WEB and APP servers are added to individual groups, that can be referenced in a pool. Using a group with dynamic selection criteria allows for automated scaling of the pool by adding/removing VMs that match the criteria:

Each group is then used to specify the members in the relevant pool to balance traffic between:

A pool then needs to be attached to a Virtual Server, which defines the IP/Port of the service and also the SSL (TLS) configuration. Here a Virtual Server is created for each service (WEB and APP):

The final step is to ensure that the new LB IPs are advertised into the physical network. As the LB is attached to a T1 gateway it must first redistribute the routes to the T0, which is done with the All LB VIP Routes toggle:

Next is to advertise the LB addresses from the T0 into the physical network, which is done by checking LB VIP under T0 Route Re-distribution:

Here’s confirmation on the physical network that we can see the /32 VIP routes coming from two ECMP BGP paths (both T0 SRs), as well as the direct Overlay subnets:

Traffic Flow

There’s now a lot of two letter acronyms in the path now from the physical network to the back end servers, there’s T0, T1, DR, SR, LB, so what does the traffic flow actually look?

The first route into the NSX-T realm is via a T0 SR, so check how it knows about the VIPs: 

It can see the VIP addresses coming from a 100.64.x.x address, which in NSX-T is a subnet that’s automatically assigned to enable inter-tier routing. In this case the interface is connected from the T0 DR to the T1 SR:

So the next stop should be the T1 gateway. From the T1 SR the VIP addresses are present under the loopback interface:

So the traffic flow for this One-Arm Load Balancer looks like the below:

The Final Product

Testing from a browser with a few refreshes confirms the (originally HTTP-delivered) WEB and APP servers are being round-robin balanced and TLS protected:

And the stats show a perfect 50/50 balance of all servers involved:

Configuring NSX-T 2.5 L2VPN Part 1 – Server

NSX-T 2.5 continues VMware’s approach to assist moving all stateful services to T1 gateways, meaning you can keep your T0 ECMP! This version brought the ability to deploy IPSec VPNs on a T1, however L2VPN still requires deployment to a T0. I’m sure it’ll be moved in a later version but for now here’s the install steps…

First, ensure your T0 gateway is configured as Acitve-Standby, which rules out ECMP, but allows stateful services. NOTE: this mode cannot be changed after deployment so make sure it’s a new T0:

To enable an L2VPN you must first enable an IPSec VPN service. Create both and attach  to your T0 gateway as below:

Next create a Local Endpoint, which attaches the the IPSec service just created and will terminate the VPN sessions. The IP for the LE must be different to the uplink address of the Edge Node it runs on, which is then advertised out over the uplink as a /32.

To ensure the LE address is advertised into the physical network enable IPSec Local IP redistribution in the T0 settings:

And here’s the route on the TOR:

Now it’s time to create the VPN session to enable the peer to connect. Select the Local Endpoint created above and enter the peer IP, PSK and tunnel IP:

You can then add segments to the session from here, or directly from the Segments menu:

OR

There’s one last thing to wrap up the server side config and that’s retrieving the peer code. Go to VPN > L2VPN Sessions > Select the session > Download Config, then copy the peer code from within the config, which will be used in the next part configuring the client…

Don’t Forget Overlay MTU Requirements!

There are several benefits to overlay networking, such as providing a larger number of segments to consume, for example over 16m for VXLAN and Geneve (both provide 24-bits for the VNI) or removing the reliance on physical network config to spin up a new subnet.

To make use of these benefits and more, the overlay doesn’t ask for much of the underlay, just a few basic things:

  • IP connectvity between tunnel endpoints
  • Any firewalls in the path allows UDP6081 for Geneve or UDP4789 for VXLAN
  • Jumbo frames with an MTU size of at least 1600 bytes
  • Optionally, multicast if you wish to optimise flooding

This article is about what happens when you don’t obey rule #3 above with NSX-T…

Topology

A basic 3-tier app has been deployed as above, with each tier on it’s own segment and connected to a T1 gateway. Routing between segments will be completely distributed in the kernel of each hypervisor that the workloads are on.

Problem

When connected to one of the WEB servers and attempting to access the APP, the connection was getting reset:

As the WEB and APP workloads were on separate hosts the traffic was being encapsulated into a Geneve packet (which cannot be fragmented) and sent over the transport network from TEP to TEP:

A ping test confirmed that connectivity was ok up to 1414 bytes, anything larger was being dropped:

root@NWATCH-WEB01 [ ~ ]# ping 10.250.70.1 -s 1414

PING 10.250.70.1 (10.250.70.1) 1414(1442) bytes of data.

1422 bytes from 10.250.70.1: icmp_seq=1 ttl=61 time=1.33 ms

^C

— 10.250.70.1 ping statistics —

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 1.326/1.326/1.326/0.000 ms

root@NWATCH-WEB01 [ ~ ]# ping 10.250.70.1 -s 1415

PING 10.250.70.1 (10.250.70.1) 1415(1443) bytes of data.

^C

— 10.250.70.1 ping statistics —

1 packets transmitted, 0 received, 100% packet loss, time 0ms

To prove the app layer was ok, a small page returning just one word was tested and worked fine:

Solved

As soon as the MTU size was increased on the underlay transport network to 1600 bytes the full webpage loaded fine:

And a ping for good measure:

root@NWATCH-WEB01 [ ~ ]# ping 10.250.70.1 -s 1415

PING 10.250.70.1 (10.250.70.1) 1415(1443) bytes of data.

1423 bytes from 10.250.70.1: icmp_seq=1 ttl=61 time=1.76 ms

1423 bytes from 10.250.70.1: icmp_seq=2 ttl=61 time=1.69 ms

^C

— 10.250.70.1 ping statistics —

2 packets transmitted, 2 received, 0% packet loss, time 2ms

rtt min/avg/max/mdev = 1.686/1.723/1.760/0.037 ms

Deploying NSX-T 2.5 with VMware’s Ansible Examples

I try to use IaC to build my labs, so that when I inevitably break something, I can always re-roll.

However, when I tried to build an NSX-T 2.5 lab, using Ansible playbooks based on the VMware examples that worked on 2.4, I received an error about the nsx role when deploying the manager OVA:

Error:\n - Invalid value 'nsx-manager nsx-controller' specified for property nsx_role.

After poking around the Manager OVF file I noticed that the value options had changed from previous versions from manager/controller/cloud options to just manager/cloud.

Here’s the values in NSX-T Manager 2.4:

And here’s the values in NSX-T Manager 2.5:

So a simple change of the nsx_role to “NSX Manager” in my playbook and everything deployed successfully!

I’ve raised an issue on the ansible-for-nsxt GitHub page for the example provided and I’ll hopefully submit a PR to resolve, by adding a version variable to make it compatible across versions.

vRealize Network Insight 5.0 Installation

The new release of vRNI adds a host of new features, you can see here, so let’s try it out in the lab…

The installation is the same as other versions, with the Platform VM installed first, followed by the Proxy VM. The basic setup is:

Platform VM

  • Deploy the OVA, providing just the name, datastore and network portgroup
  • Power up the VM and go through the CLI setup wizard from the Arkin days
  • Enter the password settings for the SSH and CLI users.
  • Enter networking details (IP, mask, gateway, DNS, NTP)
  • Continue the configuration via the web UI:
    • Apply the license
    • Set the password for web admin user (admin@local)
    • Generate Shared Secret (for Proxy VM)

Proxy VM

  • Deploy OVA, setting the Shared Secret from the Platform deployment
  • Power up the VM and go through the CLI setup wizard from the Arkin days
  • Enter the password settings for the SSH and CLI users.
  • Enter networking details (IP, mask, gateway, DNS, NTP)

 

Final Steps

Once both appliances are fully deployed, the proxy/collecter will connect to the platform and the web UI will show the below:

Click finish and continue to login to vRNI, with the username admin@local and the password set during setup.

Now add your data sources like vCenter, NSX-v, NSX-T, physical network kit, clouds etc.) to start getting visibility into your network:

Next up is viewing and working with the plethora of data you’ll see from vRNI…

What’s New in vSphere 6.7 Update 2 and NSX 6.4.5

UI Changes

You can see the first UI change in vSphere 6.7U2 when you login to vCenter. Gone is the dark blue flash-based SSO login screen, which was the final reminder of the ‘old web client’, and in with the new Clarity UI login splash screen that was introduced on the VAMI a few versions ago:

vCenter 6.7U2 Login

Dark Theme

The most important feature for any app to be cool these days that was introduced in U1 has been improved in U2 to add a bit more colour to certain features that make them easier to view:

Dark-Theme-Colour

Not everything has caught up though, NSX for example still doesn’t go full night mode.

NSX Plugin Install

A nice subtle update in 6.7U2 is that you’re not longer required to logout when you install NSX. In previous versions the warning banner would say that the NSX plugin has been installed but you need to log out to activate, now it just needs a simple refresh!

 

NSX Dashboard

The NSX dashboard in the vCenter H5 UI could always be a bit slow to load, but at least now there’s some visibility of it actually doing something in the background. Activity bars are now shown on the dashboard tiles:

NSX Dashboard Loading

NSX HTML5 UI Updates

NSX 6.4.5 now provides more (Routing, Bridging, Load Balancing) configuration for ESGs into the HTML5 UI:

NSX HTML5 Config.PNG

 

Hands On With NSX-T 2.4

In order to wrap my head around the changes from NSX-V (ok, NSX Data Center for vSphere) to NSX-T (Data Center) I’ve created a few labs with previous versions. With the latest release 2.4 though there’s a lot of simplification been done in terms of deployment and manageability that you can tell straight away from the UI.

After a few hours with NSX-T 2.4 I’d setup the following deployment:

NSXT.png

Diagramming it all out helps me to understand how the pieces fit together (and there are a lot of pieces to NSX-T).

A few things I’ve noticed so far…

Logical Switches in vCenter

A nice feature in NSX-T is the way that Logical Switches are presented in vCenter, compared to the ugly ‘virtualwires’ from NSX-V, you now get full integration of the N-VDS (NSX-T Virtual Distributed Switches) so they look just like the old school VDS’s. Heres’ the view from H01:

h01-nvds

And again from H03:

h03-nvds

New Workflows

Creating segments, routers and other networking constructs had been a little complicated in previous releases, but now the new wizards makes these tasks easy. Once created the show up in, adding some much needed visibility in to what’s been created:

new-dashboards.PNG

For my first deployment I went straight in with the Advanced Networking & Security screen, not knowing that none of these objects are shown in the fancy new dashboards… so I recreated them. Objects created through the new workflows do show up in the Advanced tab though and can be identified with the ‘Protected Object’ icon as below:

adv_networking