Quantcast
Channel: Intel Communities : Discussion List - Wired Ethernet
Viewing all articles
Browse latest Browse all 4405

Will X710 firmware update 4.53 to 5.05 address sporadic transmit queue timeout?

$
0
0

We have experienced three occurrences on two servers of this error "tx_timeout" / "hung_queue", and packets stopped flowing for some number of seconds (but then recovered):

 

Apr 10 02:04:14 node39 kernel: WARNING: at net/sched/sch_generic.c:297 dev_watchdog+0x276/0x280()
Apr 10 02:04:14 node39 kernel: NETDEV WATCHDOG: p2p1 (i40e): transmit queue 8 timed out
...
Apr 10 02:04:14 node39 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE ------------ 3.10.0-514.6.1.el7.x86_64 #1
Apr 10 02:04:14 node39 kernel: Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.1.3 11/20/2013
...
Apr 10 02:04:14 node39 kernel: i40e 0000:42:00.0 p2p1: tx_timeout: VSI_seid: 390, Q 8, NTC: 0x113, HWB: 0x116, NTU: 0x116, TAIL: 0x116, INT: 0x1
Apr 10 02:04:14 node39 kernel: i40e 0000:42:00.0 p2p1: tx_timeout recovery level 1, hung_queue 8
Apr 10 02:04:14 node39 kernel: i40e 0000:42:00.0 p2p1: adding 3c:fd:fe:9f:b7:48 vid=0

 

This is within first 3 weeks of usage of Intel X710 duo adapters running firmware 4.53 (with supported Intel SFP+) recently installed in a cluster of two-year-old Dell R620s, running CentOS 7.3:

 

node39:/# lspci -vv | grep -A 1 10GbE
pcilib: sysfs_read_vpd: read failed: Input/output error
05:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)        Subsystem: Intel Corporation Ethernet Converged Network Adapter X710-2
--
05:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)        Subsystem: Intel Corporation Ethernet Converged Network Adapter X710
--
42:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)        Subsystem: Intel Corporation Ethernet Converged Network Adapter X710-2
--
42:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)        Subsystem: Intel Corporation Ethernet Converged Network Adapter X710
node39:/usr/local/bin# ethtool -i p2p1
driver: i40e
version: 1.5.10-k
firmware-version: 4.53 0x8000206e 0.0.0
expansion-rom-version:
bus-info: 0000:42:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

 

We have used X710s without issue in a few other servers, but in those cases they are HP OEM, and running firmware 4.60:

 

node93:/# lspci -vv |grep -A 1 10GbE
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)        Subsystem: Hewlett-Packard Company HP Ethernet 10Gb 2-port 562FLR-SFP+ Adapter
--
04:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)        Subsystem: Hewlett-Packard Company Ethernet 10Gb 562SFP+ Adapter
--
05:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)        Subsystem: Hewlett-Packard Company HP Ethernet 10Gb 2-port 562SFP+ Adapter
--
05:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)        Subsystem: Hewlett-Packard Company Ethernet 10Gb 562SFP+ Adapter
node93:/# ethtool -i ens2f0
driver: i40e
version: 1.5.10-k
firmware-version: 4.60 0x80001f47 1.3072.0
expansion-rom-version:
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

 

I have downloaded nvmupdate64e and updated a spare Dell to firmware 5.05, so if this is the correct solution I have confirmed the procedure. However threads such as this one Intel X710 vs VMWare ESX: crash and reboot  give me pause-- crash and reboot would certainly be worse than a 10-20 second transmit hang.

 

My questions are:

 

  1. Has anyone else experienced these tx_timeout / hung_queue issues?
  2. Is it a known issue? If so, is it an issue with firmware, with i40e driver, or something else such as TSO/GSO (which are currently ON but I could turn them off).
  3. If it is an issue with firmware, has it been corrected between versions 4.53 and 4.60, and is it recommended to flash production machines to 5.05, or to some other version. I could not find a detailed Change List.
  4. Is there a way (such as generating high data rates using iperf) to make the sporadic issues occur reproducibly, so that I can demonstrate whether any attempted solution has been successful.

 

Thanks in advance!


Viewing all articles
Browse latest Browse all 4405

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>