I recently ran across an interesting issue in my ACI fabric and there was not that much information available about it yet, so I thought I would share it.
The issue was related to a load balancer configured in HA mode as a concrete device on the ACI fabric. When the load balancer failed over traffic to the passive node traffic going to the VIP would start failing and it would take a couple minutes for the VIP to become responsive again.
In troubleshooting the issue we looked at a packet capture and saw that immediately after the failover the load balancer would send out a gratuitous ARP as expected, however on the leaf switch it showed no GARP’s being received.
The default behavior of an ACI fabric is to do all learning via UDP unicast lookups in the endpoint database located in the spines and as such there is no need to broadcast or flood an ARP. However in order to get things like HA on load balancers and firewalls or like OS level clustering like Microsoft Windows Failover Clustering or Linux Heartbeat we need to be able to learn based on GARP. A GARP (Gratuitous ARP) is used by devices on the network as a way to proactively update the ARP cache to let other devices know that the location of a MAC address has changed (advanced notification). In order for the fabric to be able to learn endpoint moves via GARP, we need to enable some non-default features on the Bridge Domain (BD) associated with the End Point Group (EPG). Those are “ARP Flooding” and EP Move Detection Mode” (GARP Detection Mode). Below is a screenshot of the settings I am referring to:
The first screenshot of enabling ARP Flooding is from “Tenant>Networking>Bridge Domains>YOUR-BD”
The second screenshot of enabling GARP based detection is also from “Tenant>Networking>Bridge Domains>YOUR-BD”, but you then need to goto the L3 Configurations tab on the BD.
These screenshots are from an APIC running on the 1.2 codebase.
I am very excited and thankful to be selected to be of the Cisco Champions. I look forward to contributing every way I can to the community and I look forward to working with all of the other Cisco Champions over the next year. I would also like to congratulate all of the other Cisco Champions for 2016.
I have been working on getting an Cisco ACI implementation up and running. In doing so, I noticed that there were some lingering faults showing up in the APIC dashboard. One of these faults was “Physical Interface eth1/2 on Node 1 is now down”. This fault is caused by eth1/2 being set to admin up by default and the port not being used.
There is currently no way to correct this issue in the GUI. However there is an enhancement coming in a future version of the APIC software, CSCuv63617. The issue can be corrected via the CLI. You will need to log into each APIC controller and run the following commands:
It will take a few minutes, but the faults will clear.
I recently ran into an issue where my ESXi 5.5 hosts started randomly dropping off the network and the only way to get them back was to reboot them. In going through the logs below is the findings of what happened and why:
2015-06-12T16:23:54.454Z cpu17:33604)<3>bnx2x: [bnx2x_attn_int_deasserted3:4816(vmnic0)]MC assert!
2015-06-12T16:23:54.454Z cpu17:33604)<3>bnx2x: [bnx2x_mc_assert:937(vmnic0)]XSTORM_ASSERT_LIST_INDEX 0x2
2015-06-12T16:23:54.454Z cpu17:33604)<3>bnx2x: [bnx2x_mc_assert:951(vmnic0)]XSTORM_ASSERT_INDEX 0x0 = 0x00020000 0x00010017 0x05aa05b4 0x00010053
2015-06-12T16:23:54.454Z cpu17:33604)<3>bnx2x: [bnx2x_mc_assert:965(vmnic0)]Chip Revision: everest3, FW Version: 7_10_51
2015-06-12T16:23:54.454Z cpu17:33604)<3>bnx2x: [bnx2x_attn_int_deasserted3:4822(vmnic0)]driver assert
2015-06-12T16:23:54.454Z cpu17:33604)<3>bnx2x: [bnx2x_panic_dump:1140(vmnic0)]begin crash dump —————–
2015-06-12T16:23:54.454Z cpu17:33604)<3>bnx2x: [bnx2x_panic_dump:1150(vmnic0)]def_idx(0xfbd2) def_att_idx(0xa) attn_state(0x1) spq_prod_idx(0xcf) next_stats_cnt(0xdd33)
2015-06-12T16:23:54.454Z cpu17:33604)<3>bnx2x: [bnx2x_panic_dump:1155(vmnic0)]DSB: attn bits(0x0) ack(0x1) id(0x0) idx(0xa)
This is a low-level driver crash caused by a MC assert without any other hypervisor problems at the time. The bnx2x card then begins a crash dump and resets itself. The data from the crash dump of the adapter has data but it appears to only be useful to Broadcom/Qlogic. At the end of the crash dump it shows that the card gets reset.
I recently upgraded my lab firewall from the aging Cisco ASA 5505 to the brand new Cisco ASA 5506W-X. Since this device is so new, there is no information available yet about resolving any of the “gotchas” so I thought I would share a a couple of them.
I found out this morning that I was selected to the vExpert program for 2015. I was really surprised and elated that I was selected to be a part of this great group. I really did not expect to receive this great honor , but I look forward to contributing in every way that I can to the community and I look forward to working with all of the other vExperts over the next year. I would also like to congratulate all of the other vExperts for 2015.
Coming from a Cisco background this was a little bit of a change. I am now using Dell Force10 switches and I have been trying to figure out how to get LLDP to advertise the management IP to its neighbors like CDP does. CDP does this by default and there is non-default configuration that must be done in order to get LLDP to do it. With LLDP you must add LLDP configuration to each neighbor facing interface. Below is an example of what that configuration would look like:
In configuration mode: