i350-T4 NIC
Windows Server 2012 R2
All 4 i350 ports configured as a Windows LBFO Team (switch independent / dynamic load balancing)
Converged networking (HyperV vSwitch bound to the LBFO team, with vNICs configured on the vSwitch for Host OS operations (Management, Cluster/CSV. and Live Migration)
VLAN tagging in use on VM's and vNICs except the vNIC used for management which is 'native'
VMQ enabled on all i350 ports
SR-IOV disabled on all i350 ports
Server 2012 R2 HyperV cluster
Fully patched with update rollups and hotfixes currently available
Drivers 19.3 (latest from intel website)
In the above configuration the destination server blue screens during live migration. I can sometimes get 1 live migration to work, but a second attempt to live migrate a different VM to the same destination host will cause the host to blue screen.
I can reproduce this issue very easily on any host in the cluster. They all have the same behaviour
If i disable VMQ then the issue stops
Also we dont see this issue with thie same hardware and same configuration using Server 2012 (non R2) though i note that the NIC driver is diferent on this Server 2012 (e1r63x64.sys on 2012 as opposed to e1r64x64.sys on 2012 R2)
crashdup analysis always shows the faulting driver as e1r64x64.sys
BugCheck 1E, {ffffffffc0000005, fffff802be6a2550, ffffd000575b3b58, ffffd000575b3360}
*** ERROR: Module load completed but symbols could not be loaded for e1r64x64.sys
Probably caused by : e1r64x64.sys ( e1r64x64+280e7 )
Followup: MachineOwner
---------
18: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
KMODE_EXCEPTION_NOT_HANDLED (1e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc0000005, The exception code that was not handled
Arg2: fffff802be6a2550, The address that the exception occurred at
Arg3: ffffd000575b3b58, Parameter 0 of the exception
Arg4: ffffd000575b3360, Parameter 1 of the exception
Debugging Details:
------------------
WRITE_ADDRESS: unable to get nt!MmNonPagedPoolStart
unable to get nt!MmSizeOfNonPagedPoolInBytes
ffffd000575b3360
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
FAULTING_IP:
nt!ExQueryDepthSList+0
fffff802`be6a2550 8b01 mov eax,dword ptr [rcx]
EXCEPTION_PARAMETER1: ffffd000575b3b58
EXCEPTION_PARAMETER2: ffffd000575b3360
BUGCHECK_STR: 0x1E_c0000005
DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: 0
ANALYSIS_VERSION: 6.3.9600.17237 (debuggers(dbg).140716-0327) amd64fre
EXCEPTION_RECORD: 0000000000000001 -- (.exr 0x1)
Cannot read Exception record @ 0000000000000001
TRAP_FRAME: ffffe800b6200000 -- (.trap 0xffffe800b6200000)
Unable to read trap frame at ffffe800`b6200000
LAST_CONTROL_TRANSFER: from fffff802be7efefb to fffff802be768ca0
STACK_TEXT:
ffffd000`575b2b38 fffff802`be7efefb : 00000000`0000001e ffffffff`c0000005 fffff802`be6a2550 ffffd000`575b3b58 : nt!KeBugCheckEx
ffffd000`575b2b40 fffff802`be779846 : 00000000`00000000 fffff800`35d0c991 ffffe800`b1172d02 ffffd000`575b2e29 : nt!KiFatalFilter+0x1f
ffffd000`575b2b80 fffff802`be757d56 : 00000000`00000000 fffff802`be6e19a6 ffffe000`516d3f90 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0x696
ffffd000`575b2bc0 fffff802`be7701ed : 00000000`00000000 ffffd000`575b2d60 ffffd000`575b3b58 ffffd000`575b2d60 : nt!_C_specific_handler+0x86
ffffd000`575b2c30 fffff802`be6fd3a5 : 00000000`00000001 fffff802`be615000 ffffd000`575b3b00 fffff800`00000000 : nt!RtlpExecuteHandlerForException+0xd
ffffd000`575b2c60 fffff802`be6fc25f : ffffd000`575b3b58 ffffd000`575b3860 ffffd000`575b3b58 ffffe800`b12ee480 : nt!RtlDispatchException+0x1a5
ffffd000`575b3330 fffff802`be7748c2 : 00000000`00000001 fffffa80`1b6de000 ffffe800`b6200000 00000000`00000000 : nt!KiDispatchException+0x61f
ffffd000`575b3a20 fffff802`be772dfe : 00000000`00000011 00000000`00000002 00000000`00000001 fffff802`be8a929a : nt!KiExceptionDispatch+0xc2
ffffd000`575b3c00 fffff802`be6a2550 : fffff800`35d04875 ffffe800`b0f3c870 ffffd000`575b3e00 ffffe000`517cd000 : nt!KiGeneralProtectionFault+0xfe
ffffd000`575b3d98 fffff800`35d04875 : ffffe800`b0f3c870 ffffd000`575b3e00 ffffe000`517cd000 00000000`00000000 : nt!ExQueryDepthSList
ffffd000`575b3da0 fffff800`372520e7 : ffffe000`517ce540 ffffe000`517cd000 ffffe800`b1496c60 00000000`00000000 : NDIS!NdisFreeNetBufferList+0xb5
ffffd000`575b3e20 fffff800`372528a9 : ffffe000`517ce540 ffffe000`517cd000 00000000`00000001 00000000`00000000 : e1r64x64+0x280e7
ffffd000`575b3e50 fffff800`37252c00 : ffffe000`517ce540 00000000`00000001 00000000`00000000 ffffe000`517cd000 : e1r64x64+0x288a9
ffffd000`575b3e90 fffff800`37264a9d : ffffe000`517cd000 ffffe000`00000001 ffffe000`00000001 ffff0001`00000001 : e1r64x64+0x28c00
ffffd000`575b3ec0 fffff800`37261c7b : 00000000`00000000 ffffd000`575469a0 ffffe000`517cd000 00000000`00000000 : e1r64x64+0x3aa9d
ffffd000`575b3f00 fffff800`3725a909 : 00000000`00000002 00000000`00000000 ffffe000`517cd000 ffffd000`575469a0 : e1r64x64+0x37c7b
ffffd000`575b3f50 fffff800`3725b02b : ffffe800`b528cde0 fffff800`35d04671 ffffd000`575b40f0 ffffe000`51105ad0 : e1r64x64+0x30909
ffffd000`575b3fc0 fffff800`35d8f0fa : ffffe800`b5b87868 ffffe800`b5b87858 ffffe800`b5b87854 ffffe800`b0d501a0 : e1r64x64+0x3102b
ffffd000`575b4030 fffff800`35d033a3 : ffffe800`b0d501a0 ffffd000`575b40e9 ffffe800`b5b87820 00000000`00000011 : NDIS!ndisMInvokeOidRequest+0x4e
ffffd000`575b4070 fffff800`35d04324 : 00000000`00000000 ffffe800`b0d501a0 ffffe800`b5b87868 00000000`00000000 : NDIS!ndisMDoOidRequest+0x39b
ffffd000`575b4150 fffff800`35d0475e : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : NDIS!ndisQueueOidRequest+0x4c4
ffffd000`575b42f0 fffff800`3679719e : ffffe800`b147b8c0 00000000`00010224 ffffe800`b147b8c0 ffffe000`52bf4010 : NDIS!NdisFOidRequest+0xc2
ffffd000`575b43b0 fffff800`35d038de : ffffe800`b5b87820 ffffe000`51105ad0 00000000`00000000 ffffe000`52bea010 : wfplwfs!LwfLowerOidRequest+0x6e
ffffd000`575b43e0 fffff802`be6e19a6 : ffffd000`575b46d0 ffffd000`575af000 00000000`00000000 00000000`00000000 : NDIS!ndisFDoOidRequestInternal+0x2ee
ffffd000`575b44e0 fffff800`35d04131 : fffff800`35d035f0 ffffe000`52bea010 ffffe800`b1a0b400 00000000`00000000 : nt!KeExpandKernelStackAndCalloutInternal+0xe6
ffffd000`575b45d0 fffff800`35d03d27 : 00000000`00000102 ffffd000`53203200 00000000`00000000 ffffd000`575467d0 : NDIS!ndisQueueOidRequest+0x2d1
ffffd000`575b4770 fffff800`372ea204 : 00000000`00000120 ffffe000`516d4000 00000000`00000120 ffffe000`52bf5000 : NDIS!ndisMOidRequest+0x193
ffffd000`575b4880 fffff800`372e858d : ffffe000`5200ff00 ffffd000`00000001 ffffe000`52bf5020 ffffe800`b5b87820 : NdisImPlatform!implatDoOidRequestOnAdapter+0x22c
ffffd000`575b4900 fffff800`372ea32c : ffffe800`b1ae3880 fffff802`be6546c9 ffffe000`52bf5000 00000000`00000000 : NdisImPlatform!implatOidRequestInternal+0x1fd
ffffd000`575b4ac0 fffff802`be650f4a : ffffe800`b1b54ca0 ffffe000`52c10050 ffffe000`52c10050 fffff800`6977444e : NdisImPlatform!implatOidRequestWorkItem+0x24
ffffd000`575b4af0 fffff802`be651a2b : fffff800`362ed330 fffff802`be650ed4 ffffd000`575b4bd0 ffffe800`b1b54ca0 : nt!IopProcessWorkItem+0x76
ffffd000`575b4b50 fffff802`be6ee514 : 00000000`00000000 ffffe800`b1ae3880 ffffe800`b1ae3880 ffffe000`50832900 : nt!ExpWorkerThread+0x293
ffffd000`575b4c00 fffff802`be76f2c6 : ffffd000`55503180 ffffe800`b1ae3880 ffffd000`5550f7c0 00000014`00000006 : nt!PspSystemThreadStartup+0x58
ffffd000`575b4c60 00000000`00000000 : ffffd000`575b5000 ffffd000`575af000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
STACK_COMMAND: kb
FOLLOWUP_IP:
e1r64x64+280e7
fffff800`372520e7 813ddfd0030001000500 cmp dword ptr [e1r64x64+0x651d0 (fffff800`3728f1d0)],50001h
SYMBOL_STACK_INDEX: b
SYMBOL_NAME: e1r64x64+280e7
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: e1r64x64
IMAGE_NAME: e1r64x64.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 531f9173
FAILURE_BUCKET_ID: 0x1E_c0000005_e1r64x64+280e7
BUCKET_ID: 0x1E_c0000005_e1r64x64+280e7
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:0x1e_c0000005_e1r64x64+280e7
FAILURE_ID_HASH: {6d380028-1764-7d25-d8c5-05559a475808}
So it seems that this intel driver has issues with VMQ.
VMQ is quite famous for NIC vendors and buggy drivers in Server 2012 R2
Disabling VMQ is not an option for us in production. We need it to work
Can anyone please confirm this issue exists on the latest 19.3 driver in Server 2012 R2?
Any idea when it will get fixed?
I'm shocked that such an awful bug would exist 12 months after launch of Server 2012 R2 on latest intel drivers for a technology that MS and Intel co-developed.
I would expect this kind of thing from Broadcom, i wouldnt expect it from Intel. Thats why we buy Intel
Perhaps we made a mistake there...
Help and comments appreciated