Quantcast
Channel: Intel Communities : Discussion List - Wired Ethernet
Viewing all articles
Browse latest Browse all 4405

GUEST with bonding and VLAN support on CentOS 6.5 with Intel SR-IOV

$
0
0

Overview

In our setup we have a physical server hosting several virtual machine guests.

All guests are assigned two interfaces which are bonded together for redundancy support.

GUESTS communicate over the native VLAN (bond0) and one or more extra vlan interfaces (bond0.x).

The setup is build with

  • HP Proliant DL380P Gen8 servers (16 core, 48 GB RAM)
  • HP 561FLR-T NIC containing Intel X540-AT2 chipset
  • Linux CentOS 6.5 (64 bit HOST and a mixture of 32 and 64 bit GUESTS)
  • Some VM's have SR-IOV enabled, other VMs are bridged (Open vSwitch). Note that bridged GUESTS have only their eth0 connected, eth1 is never used because redundancy is provided on the HOST.

It took some effort to get this setup up and running. Especially the bonding part. I would like to share our experience for the benefit of the community. I'm not a specialist in this domain and I have no experience with other setups (e.g. other hardware or OS).


The diagram

SR-IOV-bonding-VLAN.png

Configuration

I skip the process to enable SR-IOV in the Bios. This will most likely be different on your platform anyhow. See the HP document referenced below for details on HP proliant servers.


Kernel

We use kernel kernel-2.6.32-431.20.3.el6.x86_64

The kernel parameters intel_iommu=on pci=realloc intremap=no_x2apic_optout are required to enable SR-IOV support.


Driver

We upgraded to the latest driver officially supported by HP. Without this driver update communication between the VM and the HOST was only possible on the native network and not on a VLAN tagged network.

kmod-hp-ixgbe-3.19.0.46-4.rhel6u5.x86_64

kmod-hp-ixgbevf-2.12.0.38-4.rhel6u5.x86_64

 

The kernel module parameters used are:

options ixgbe max_vfs=63,63


Virsh qemu-kvm

Virsh has support for SR-IOV and can assign a virtual function to the VM. On the HOST two networks are defined, one for eth0 and one for eth1.They use mode=hostdev which is for SR-IOV support. Below is the definition for eth0. The one for eth1 is similar.


cat /etc/libvirt/qemu/networks/autostart/passthrough_eth0.xml

<network>

  <name>passthrough_eth0</name>

  <uuid>4bbbf5e2-7b80-7cf9-c667-50bb711f2e4c</uuid>

  <forward mode='hostdev' managed='yes'>

    <pf dev='eth0'/>

  </forward></network>

 

Assign two network interfaces to the GUEST, one from passthrough_eth0 and one from passthrough_eth1.

<domain type="kvm">

  ...

  <devices>

     ...

    <interface type="network">

      <mac address="52:54:00:bb:f7:8f"/>

      <source network="passthrough_eth0"/>

    </interface>

    <interface type="network">

      <mac address="52:54:00:47:ce:4f"/>

      <source network="passthrough_eth1"/>

    </interface>

  </devices>

</domain>


Bonding on the HOST

This was the most tricky part to get right.

Starting with the HOST. We rely on the link state of the interface to trigger the failover. eth0 is the preferred interface. This is to make sure that eth0 is active when possible. The updelay of 30 seconds is the time needed by the switch port to come in the spanning tree forwarding state. Therefore we wait 30 seconds before using eth0 as the active interface when it becomes (again) available. The resulting bonding configuration stored in /etc/modprobe.d/bonding.conf is:

 

alias bond0 bonding

options bonding mode=1 miimon=100 primary=eth0 updelay=30000


Bonding on the GUEST

Take into account:

  • The guest can rely on the link state of the physical network card. Virtual Functions have the same link state as their Physical counter part. So we can use the same bonding configuration on the GUEST.
  • The HOST and GUEST must use the same active network card. A VM that uses eth1 can't communicate with the HOST when the host uses eth0. Even when both interfaces are up and running. Therefore it is crucial that HOST and GUEST failover at the same time to the same interface.
  • When eth1 is enslaved to the bond (during ifup eth1), the bonding driver will change the MAC address of eth1 to the MAC of eth0. So once bonded, both NICs will have the same MAC address. Unfortunately this change of MAC address is not applied at the level of the HOST. The HOST notices the attempt to change the MAC, but refuses to actually change it. The result is that the GUEST and HOST have a different MAC for the same interface. Packets sent by the GUEST over eth1 will be dropped on the network card by the mac spoofing feature.
    There are two possible solutions for this issue (or maybe there are more that I don't know about)
    • configure the bonding option fail_over_mac=active. This option will make that the bond interface will use the MAC address of the active interface. During a failover the MAC address will change to the MAC of the newly active interface. All hosts on the subnet need to update their ARP tables. A (configurable) number of gratuitious ARPs are send over the bond interface to force these ARP table updates.
      One problem with this solution is that vlan interfaces on top the bonding interface do not change their MAC address. So bond0.123 will still use the MAC of eth0 even when eth1 is the active interface. The result is that only communication over the native VLAN works.
    • Manually change the MAC address on the HOST. using the command ip link set eth1 vf 3 mac aa:bb:cc:dd:ee:ff (but then with the correct MAC of eth0 on the GUEST) will make the bond work. Even for vlan interfaces.
    • It must be said that a test using kernel 3.10.48-1.el6.elrepo.x86_64 and ixgbe 3.13.10-k and ixgbevf 2.7.12-k showed that changing the MAC in the GUEST automatically changed the MAC in the HOST. But using this setup we had other issues like: virsh enumerates the VFs wrong and assigns the MAC address for the GUEST to the wrong virtual function. And we also saw issues where a GUEST was sending ARP requests from a different MAC than it's own. We left this path without finding the real cause or a real solution.

In the end we use the same bonding config on the GUEST:

 

alias bond0 bonding

options bonding mode=1 miimon=100 primary=eth0 updelay=30000


After virsh has started the GUEST, we have a script on the HOST that updates the MAC address of eth1 of the GUEST.

 

Result

After all this configuration, what is now the end result?

Communication between two VMs works both over the native as tagged VLAN interfaces. Unplugging a network cable will cause a bonding failover and all communication resumes. Restoring the network cable will force all bonds back to eth0 after 30 seconds.

VM to HOST communication also works as expected (native, tagged VLAN, bonding failover).

But keep in mind that all machines (HOST and all GUESTS) must use the same active interface.

 

What still doesn't work is communication between a GUEST using SR-IOV and a GUEST connected to the bridge on the HOST. An SR-IOV GUEST and a bridged GUEST can't communicate with each other. Each GUEST can communicate with the HOST or the outside world (native and tagged VLAN), but they can't communicate to each other. Ethernet broadcast packets (e.g. ARP requests) arrive in the bridged GUEST, but unicast ethernet frames don't arrive in the bridged GUEST. Packets from the bridged GUEST to the SR-IOV guest appear to work as expected.

Maybe someone has a solution to this problem...

 

References


Viewing all articles
Browse latest Browse all 4405

Trending Articles