NSX-V Site Failover/Failback Plan: Part 3

Posted by
Reading Time: 3 minutes

This blog is a continuation of the “Planned” or “Unplanned” failover of NSX-V components i.e. NSX Manager, controllers, universal distributed logical routers in an Active/Passive datacentre scenario i.e. all North/South routing flow via one site’s ESG(s).

Just to reverberate, I have split this topic into three parts:

    1. Part 1 (here), talks about:
        1. Use Cases
        2. Assumptions
        3. Current state and Target State i.e. before and after failover
        4. Pre-requisites
        5. Summary of the Failover Plan
    2. Part 2 (here), talks about the failover configuration steps to make Site-B “Primary”
    3. Part 3 (this blog), talks about the configuration steps required after Site-A comes back online to avoid conflicts.

I would encourage you to visit the previous blog Part 2 and get familiar the failover configuration steps to make Site-B “Primary” in this Cross-vCenter NSX Design, before proceeding ahead.

Let’s recap a brief summary of the failover steps followed in Part 1, in the previous blog below:

Site-A Steps: (Only in case of a planned Failover):

    1. Shutdown all ESGs/DLRs/UDLRs
    2. Shutdown Controllers
    3. Shutdown NSX Manager

Site-B Steps:

    1. Disconnect Secondary NSX Manger from Primary
    2. Promote the NSX Manager as Primary
    3. Deploy the Universal NSX Controllers in Site-B
    4. Deploy UDLR Control VMs
    5. Verify “Global Configuration” on the UDLR
    6. Verify and amend “Dynamic Routing” configuration for the UDLR control VM(s)
    7. Amend any dynamic routing configuration on ESGs, as necessary
    8. Optional: If “Site-B” will be the “Primary” for some forceable future, update the syslog, NTP and DNS IPs for the NSX components
    9. If deployed, enable any “OneArm” Load Balancer ESG(s) network connectivity in Site-B.

We have successfully promoted “Site-B” as “Primary” and restored the North/South connectivity via “Site-B” ESGs.

Objective: Now, when the “Site-A” comes back online and you will now have “two” NSX Managers with “Primary” role assigned. The goal of this blog is to demote the “Site-A” NSX-V Manager to secondary to avoid any conflicts.

Below are the diagrams to visualize the placement of the NSX-V components and OneArm Load Balancer, after following the steps in “Part 3” of this Failover Plan: 

Location of NSX-V Components, and when Site-A comes back online and after reconfiguration (Click on the Image to enlarge it):

OneArm Load Balancer (If Deployed), and when Site-A comes back online and after reconfiguration (Click on the Image to enlarge it):

Site-A:
    1. Power on, all ESGs/DLRs/UDLRs
    2. Power on, Controllers
    3. Power on, NSX Manager
    4. Force remove Secondary NSX Manager from “Site-A”:
        1. Go to “Network and Security” Plugin in the vSphere Client
        2. Installation and Upgrade -> Management -> NSX Managers
        3. Both NSX Managers will now have “Primary” role
        4. Select the “Site-A” NSX Manager, click actions -> “Remove Secondary NSX Manager”
        5. Click “Perform operation even if the NSX Manager is inaccessible” and click “OK”.
    5. Demote the “Site-A” NSX Manager from “Primary” to “Secondary”:
        1. Select the “Site-A” NSX Manager, click actions -> “Remove Primary Role”
        2. Click “Yes” to confirm
    6. Delete “Site-A” NSX Controllers:
        1. Select “Site-A” NSX Controller 1 and click Delete
        2. Repeat the step for the remaining two “Site-A” NSX controllers
        3. Click “Forcefully Delete” and “Check here to acknowledge the warning” option when you delete the last controller
    7. Delete “Site-A” UDLR:
        1. Go to “Network and Security” Plugin in the vSphere Client -> NSX Edges
        2. Select “Site-A” NSX Manager’s IP address from the drop-down menu
        3. Click the respective UDLR and click Delete
        4. Click “Yes” in the confirmation box

Note: Follow the same steps above for each UDLR Instance as necessary

Site-B:
    1. Assign “Site-A” NSX Manager “Secondary” role:
        1. Go to “Network and Security” Plugin in the vSphere Client
        2. Installation and Upgrade -> Management -> NSX Managers
        3. Select the “Site-B” NSX Manager (one with Primary)
        4. Click actions -> “Add Secondary NSX Manager”
        5. Specify the “IP”, “admin” and the passwords as requested.
        6. Trust the certificate and click “Yes” to proceed.
Site-A:
    1. Amend any dynamic routing configuration on Site-A ESGs for the associated UDLR, as necessary:
        1. If configured, amend filters as necessary to permit or deny network routes to physical switch neighbors
        2. Check BGP Neighbors status is Established, UP
        3. Verify routes are being exchanged (both Physical routes and UDLR routes respectively)
    2. If deployed, disable any “OneArm” Load Balancer ESG(s) network connectivity (disable interface)
 Site-B:
    1. Verify dynamic routing configuration on the UDLR for the “Site-A” ESGs:
        1. Open the console of the UDLR VM and login with “admin” credentials
        2. Verify BGP neighbors status is “Established” and “UP” for both “Site-A” and “Site-B” ESG IPs, by running the following command:

                   show ip bgp neighbors

        3. Verify the routes (or “Default” route) are being received, as appropriate:

              show ip route

This completes the configuration steps when “Site-A” comes back online and also completes the “Failover Plan” from “Site-A” to “Site-B”.

Failback Plan:

Now that “Site-B” is running as Primary and “Site-A“ as secondary for NSX-V components, the “Failback” to “Site-A” is no different to this “Failover Plan”. The steps will be exactly the same but considering “Site-B” as primary.

Leave a Reply

Your email address will not be published. Required fields are marked *