Good time.
We experince link flapping while connecting two supermicro servers with 82599ES adapters:
[Wed Jan 6 07:10:15 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[Wed Jan 6 07:10:15 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down
[Wed Jan 6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[Wed Jan 6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down
[Wed Jan 6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[Wed Jan 6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down
[Wed Jan 6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[Wed Jan 6 07:10:17 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down
[Wed Jan 6 07:10:17 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
So we are trying to connect two supermicro servers with E10G42BTDA X520-DA2 controller in each server:
81:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
In one controller there is following SFP module:
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x07 (LC)
Transceiver codes : 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 10G Ethernet: 10G Base-LR
Encoding : 0x06 (64B/66B)
BR, Nominal : 10300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF,km) : 20km
Length (SMF) : 20000m
Length (50um) : 0m
Length (62.5um) : 0m
Length (Copper) : 0m
Length (OM3) : 0m
Laser wavelength : 1330nm
Vendor name : GIGALINK
Vendor OUI : 00:90:65
Vendor PN : GL-OT-ST12LC1-13
Vendor rev : A
Optical diagnostics support : Yes
Laser bias current : 37.020 mA
Laser output power : 0.9654 mW / -0.15 dBm
Receiver signal average optical power : 0.4904 mW / -3.09 dBm
Module temperature : 44.23 degrees C / 111.61 degrees F
Module voltage : 3.2836 V
Alarm/warning flags implemented : Yes
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser output power high alarm : Off
Laser output power low alarm : Off
Laser output power high warning : Off
Laser output power low warning : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser rx power high alarm : Off
Laser rx power low alarm : Off
Laser rx power high warning : Off
Laser rx power low warning : Off
Laser bias current high alarm threshold : 85.000 mA
Laser bias current low alarm threshold : 10.000 mA
Laser bias current high warning threshold : 80.000 mA
Laser bias current low warning threshold : 12.000 mA
Laser output power high alarm threshold : 3.1623 mW / 5.00 dBm
Laser output power low alarm threshold : 0.3162 mW / -5.00 dBm
Laser output power high warning threshold : 2.5119 mW / 4.00 dBm
Laser output power low warning threshold : 0.3981 mW / -4.00 dBm
Module temperature high alarm threshold : 85.00 degrees C / 185.00 degrees F
Module temperature low alarm threshold : -10.00 degrees C / 14.00 degrees F
Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F
Module temperature low warning threshold : -5.00 degrees C / 23.00 degrees F
Module voltage high alarm threshold : 3.7000 V
Module voltage low alarm threshold : 2.9000 V
Module voltage high warning threshold : 3.6000 V
Module voltage low warning threshold : 3.0000 V
Laser rx power high alarm threshold : 1.0000 mW / 0.00 dBm
Laser rx power low alarm threshold : 0.0200 mW / -16.99 dBm
Laser rx power high warning threshold : 0.7943 mW / -1.00 dBm
Laser rx power low warning threshold : 0.0251 mW / -16.00 dBm
In another controller:
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x07 (LC)
Transceiver codes : 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 10G Ethernet: 10G Base-LR
Encoding : 0x06 (64B/66B)
BR, Nominal : 10300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF,km) : 20km
Length (SMF) : 20000m
Length (50um) : 0m
Length (62.5um) : 0m
Length (Copper) : 0m
Length (OM3) : 0m
Laser wavelength : 1270nm
Vendor name : GIGALINK
Vendor OUI : 00:90:65
Vendor PN : GL-OT-ST12LC1-12
Vendor rev : A
Option values : 0x00 0x1a
Option : RX_LOS implemented
Option : TX_FAULT implemented
Option : TX_DISABLE implemented
BR margin, max : 0%
BR margin, min : 0%
Vendor SN : G201511300616
Date code : 151118
Optical diagnostics support : Yes
Laser bias current : 35.120 mA
Laser output power : 0.7842 mW / -1.06 dBm
Receiver signal average optical power : 0.5695 mW / -2.45 dBm
Module temperature : 45.68 degrees C / 114.22 degrees F
Module voltage : 3.2440 V
Alarm/warning flags implemented : Yes
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser output power high alarm : Off
Laser output power low alarm : Off
Laser output power high warning : Off
Laser output power low warning : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser rx power high alarm : Off
Laser rx power low alarm : Off
Laser rx power high warning : Off
Laser rx power low warning : Off
Laser bias current high alarm threshold : 85.000 mA
Laser bias current low alarm threshold : 10.000 mA
Laser bias current high warning threshold : 80.000 mA
Laser bias current low warning threshold : 12.000 mA
Laser output power high alarm threshold : 3.1623 mW / 5.00 dBm
Laser output power low alarm threshold : 0.3162 mW / -5.00 dBm
Laser output power high warning threshold : 2.5119 mW / 4.00 dBm
Laser output power low warning threshold : 0.3981 mW / -4.00 dBm
Module temperature high alarm threshold : 85.00 degrees C / 185.00 degrees F
Module temperature low alarm threshold : -10.00 degrees C / 14.00 degrees F
Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F
Module temperature low warning threshold : -5.00 degrees C / 23.00 degrees F
Module voltage high alarm threshold : 3.7000 V
Module voltage low alarm threshold : 2.9000 V
Module voltage high warning threshold : 3.6000 V
Module voltage low warning threshold : 3.0000 V
Laser rx power high alarm threshold : 1.0000 mW / 0.00 dBm
Laser rx power low alarm threshold : 0.0200 mW / -16.99 dBm
Laser rx power high warning threshold : 0.7943 mW / -1.00 dBm
Laser rx power low warning threshold : 0.0251 mW / -16.00 dBm
We've checked both SFP+ modules and patchcord in other hardware and link was working fine there. As for module version we've tried both 4.1.5 and 4.3.13 and we experience problem with both versions. I've built v4.3.13 module with printk enabled, and I found following messages in dmesg:
[Wed Jan 6 06:04:02 2016] ixgbe_get_media_type_82599
[Wed Jan 6 06:04:02 2016] ixgbe_check_mac_link_generic
[Wed Jan 6 06:04:02 2016] ixgbe_get_media_type_82599
[Wed Jan 6 06:04:02 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down
[Wed Jan 6 06:04:02 2016] ixgbe_check_mac_link_genericixgbe_fc_enable_genericixgbe_fc_autonegixgbe_check_mac_link_generic
[Wed Jan 6 06:04:02 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[Wed Jan 6 06:04:02 2016] ixgbe_get_media_type_82599
[Wed Jan 6 06:04:02 2016] ixgbe_check_mac_link_generic
[Wed Jan 6 06:04:02 2016] ixgbe_get_media_type_82599
[Wed Jan 6 06:04:02 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down
Does this helps to understand reasons for such behaviour?
What makes me wonder. In Documentation/networking/ixgbe.txt I found following information
82599-BASED ADAPTERS
NOTES: If your 82599-based Intel(R) Network Adapter came with Intel optics, or
is an Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel
optics and/or the direct attach cables listed below.
When 82599-based SFP+ devices are connected back to back, they should be set to
the same Speed setting via ethtool. Results may vary if you mix speed settings.
82598-based adapters support all passive direct attach cables that comply
with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach
cables are not supported.
Yet any attempt to set speed fails:
Cannot set new settings: Invalid argument
not setting speed
Why is that? That said I found suggestion to set advertising mode:
ethtool -s enp129s0f1 advertise 0x1000
and this worked, but link flapping continues.
Yes, I realize that these modules are unsupported and to make them working I had to load module with allow_unsupported_sfp=1,1 but the main problem we have is that I failed to find any intel suggested WDM SFP+ module. We need to use only single fiber cable since in our final configuration we need to connect both servers to our provider and they highly recomend WDM connections. If these modules are not going to work does there exist any WDM module that will?
driver: ixgbe
version: 4.1.5
firmware-version: 0x61c10001
bus-info: 0000:81:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
Settings for enp129s0f1:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: No
Advertised link modes: 10000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: No
Speed: 10000Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 0
Transceiver: external
Auto-negotiation: off
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
We've struggle with this problem for few days already so any suggestions are more then wellcome.
Thanks in advance for any help.