I recently ran across an interesting issue in my ACI fabric and there was not that much information available about it yet, so I thought I would share it.
The issue was related to a load balancer configured in HA mode as a concrete device on the ACI fabric. When the load balancer failed over traffic to the passive node traffic going to the VIP would start failing and it would take a couple minutes for the VIP to become responsive again.
In troubleshooting the issue we looked at a packet capture and saw that immediately after the failover the load balancer would send out a gratuitous ARP as expected, however on the leaf switch it showed no GARP’s being received.
The default behavior of an ACI fabric is to do all learning via UDP unicast lookups in the endpoint database located in the spines and as such there is no need to broadcast or flood an ARP. However in order to get things like HA on load balancers and firewalls or like OS level clustering like Microsoft Windows Failover Clustering or Linux Heartbeat we need to be able to learn based on GARP. A GARP (Gratuitous ARP) is used by devices on the network as a way to proactively update the ARP cache to let other devices know that the location of a MAC address has changed (advanced notification). In order for the fabric to be able to learn endpoint moves via GARP, we need to enable some non-default features on the Bridge Domain (BD) associated with the End Point Group (EPG). Those are “ARP Flooding” and EP Move Detection Mode” (GARP Detection Mode). Below is a screenshot of the settings I am referring to:
The first screenshot of enabling ARP Flooding is from “Tenant>Networking>Bridge Domains>YOUR-BD”
The second screenshot of enabling GARP based detection is also from “Tenant>Networking>Bridge Domains>YOUR-BD”, but you then need to goto the L3 Configurations tab on the BD.
These screenshots are from an APIC running on the 1.2 codebase.