I just received a few Dell servers to test as replacements for the vendor we currently use for vsphere. I installed ESXi 6.5u1, patched up for spectre/meltdown so I'm now on build 7967591. The servers are hooked into storage via multiple 10gig ethernet paths handled by dual port X710 NIC's:
[root@vm4:~] esxcli network nic get -n vmnic0
Advertised Auto Negotiation: false
Advertised Link Modes: 10000BaseT/Full
Auto Negotiation: false
Cable Type: DA
Current Message Level: -1
Driver Info:
Bus Info: 0000:18:00:0
Driver: i40en
Firmware Version: 6.00 0x800034e6 18.3.6
Version: 1.3.1
Link Detected: true
Link Status: Up
Name: vmnic0
PHYAddress: 0
Pause Autonegotiate: false
Pause RX: true
Pause TX: true
Supported Ports: DA
Supports Auto Negotiation: false
Supports Pause: true
Supports Wakeon: true
Transceiver:
Virtual Address: 00:50:56:5a:d4:93
Wakeon: MagicPacket(tm)
Almost immediately after putting any real load on them, the NIC simply stops passing traffic. What's worse, it remains in an UP state, so vmware never tries the failover link. Seeing this on the vmware side:
2018-04-01T11:30:41.265Z cpu1:65925)StorageApdHandlerEv: 117: Device or filesystem with identifier [e87ff85e-cb1d8034] has exited the All Paths Down state.
2018-04-01T12:39:19.849Z cpu46:66166)i40en: i40en_HandleMddEvent:6495: Malicious Driver Detection event 0x02 on TX queue 0 PF number 0x00 VF number 0x00
2018-04-01T12:39:19.849Z cpu46:66166)i40en: i40en_HandleMddEvent:6521: TX driver issue detected, PF reset issued
That of course lead me to the following two closed threads in this forum:
https://communities.intel.com/thread/117035
https://communities.intel.com/thread/117076
Is it safe to assume this NIC is still broken and will never see fixes, with both sides blaming each other? Can't say it leaves me very happy with Dell either as they knew we were doing vsphere on these, yet they still bundled them.