Quantcast
Channel: Intel Communities : Discussion List - Wired Ethernet
Viewing all articles
Browse latest Browse all 4405

Questions about Intel 82599: Flow director + NUMA + performance

$
0
0

Hi there, how are you?

 

We're trying to get the max possible performance (throughput) out of our servers. Here are some context followed by the main changes we did and then a lot of questions in regard of Intel 82599 and linux parameters, please let us know if we miss any part or ask if you need any further clarification. (like buffer/queue sizes, ring buffer, qdisc or rcv/send buffer)

 

Context:

  • Goal: to get the max throughput through packet locality (latency isn't to bother under 0.5s)
  • Load: mostly video (streaming) chunks, ranging from 170KB to 2.3MB
  • App (user land): nginx (multi process pined by each core)
  • OS (kernel): RHEL 7.4 (3.10)
  • NIC (driver): Intel(R) 82599 10 Gigabit Dual Port Network Connection (rev 01) (ixgbe - 5.3.7)
  • Bonding: Bonding Mode: IEEE 802.3ad Dynamic link aggregation (it's a single card with two inputs 10Gbps each giving us 20Gbps)
  • HW: CPU=Intel(R) Xeon(R) CPU E5-2630L 0 @ 2.00GHz, Hyper Thread=off, CPU's socket=2, CPU's core: 12, 64GB RAM
  • NUMA layout:

available: 2 nodes (0-1)

node 0 cpus: 0 1 2 3 4 5

node 0 size: 32605 MB

node 0 free: 30680 MB

node 1 cpus: 6 7 8 9 10 11

node 1 size: 32767 MB

node 1 free: 30952 MB

node distances:

node   0   1

  0:  10  20

  1:  20  10

What we did:

  • Install the latest driver (ixgbe - 5.3.7)
  • Run set_irq_affinity -x local ethX with `x` option (enabling RSS and XPS)  and for local NUMA
  • Enabled Flow director: ntuple-filters on
  • Set affinity for our user land application (nginx's worker_cpu_affinity auto)
  • XPS seems to be enabled (cat /sys/class/net/eth0/queues/tx-0/xps_cpus)
  • RSS seems to be working (cat /proc/interrupts | grep eth)
  • RFS seems to be disabled (cat /proc/sys/net/core/rps_sock_flow_entries; shows 0)
  • RPS seems to be disabled (cat /sys/class/net/eth3/queues/rx-10/rps_cpus ; show 00000,00000 for all queues)

Questions:

  1. Do we need to enable RPS for the (HW acc) RSS to work? (when we check /sys/class/net/eth0/queues/rx-0/rps_cpus it does has 00000000,00000000 for all the queues)
  2. Do we need to enable RFS for the Flow Director to work? (cat /proc/sys/net/core/rps_sock_flow_entries; shows 0)
  3. Do we need to add any rule for Flow Director to work? (on TCP4 case, since we can't see any explicit rule we supposed it uses the perfect hash (src and dst ip and port))
  4. How can we be sure that RSS and flow director are working properly?
  5. Why can't we use the most modern QDisc for this multiple queue driver/NIC? (like fq or fq_codel we tried to set up with sysctl net.core.default_qdisc, is it because of multiple queues?)
  6. Does a single NIC  only connects directly to a single NUMA node? (when we run set_irq_affinity -x local ethX it set all the queues to the first NUMA node)
  7. If 6) is true then what's better for throughput: to pin the NIC to a single NUMA node or to spread the multiple queues into all the nodes?
  8. Still if 6) is true then if we buy a second NIC card are we able to make it connected to the second NUMA node?
  9. We tried to set coalescence for the TX ring buffer (ethtool -C eth3 tx-usecs 84) it just ignored our value, isn't possible to set coalescence for TX ring buffer?
  10. Should we enable HT but use as few queues as real cpus/cores?

 

If you read until here, thank you very much

 

References:


Viewing all articles
Browse latest Browse all 4405

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>