Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Jul 2014 17:44:15 -0700
From:      Adrian Chadd <adrian@freebsd.org>
To:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, FreeBSD Net <freebsd-net@freebsd.org>
Subject:   UDP/TCP versus IP frames - subtle out of order packets with hardware hashing
Message-ID:  <CAJ-VmomUNJ23CHLLX2qryAuE2XQyBmo30du3MuRnobs%2BwEkguA@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,

Whilst digging into UDP receive side scaling on the intel ixgbe(4)
NIC, I stumbled across how it hashes traffic between IP fragmented
traffic and non IP-fragmented traffic.

Here's how it surfaced:

* the ixgbe(4) NIC is configured to hash on both IP (2-tuple) and
TCP/UDP (4-tuple);
* when a non-fragmented UDP frame comes in, it's hashed on the 4-tuple
and comes into queue A;
* when a fragmented UDP frame comes in, it's hashed on the IP 2-tuple
and comes into queue B.

So if there's a mix of small and large datagrams, we'll end up with
some packets coming in via queue A and some by queue B. In normal
operation that'll result in out of order packets.

For the RSS stuff I'm working on it means that some packets will match
the PCBGROUP setup and some won't. By default UDP configures a 2-tuple
hash so it expects packets to come in hashed appropriately. But that
only matches for large frames. For small frames it'll be hashed via
the 4-tuple and it won't match.

The ip reassembly code doesn't recalculate the flowid/flowtype once
it's finished. It'd be nice to do that before further processing so it
can be placed in the right netisr.

So there's a couple of semi-overlapping issues:

* Right now we could get TCP and UDP frames out of order. I'd like to
at least have ixgbe(4) hash on the 2-tuple for UDP rather than the
4-tuple. That fixes that silly corner case. It's not likely going to
show up except for things like forwarding workloads. Maybe people
doing memcached work, I'm not sure.

* Whether or not to calculate the flowid/flowtype in ip_reass() (or
maybe in the netisr input path, in case there's no flowid assigned) so
work is better distributed;

* .. then if we do that, we could do 4-tuple UDP hashing again and
we'd just recalculate for any large frames.

Here's what happened with Linux and ixgbe in 2010 on this topic:

http://comments.gmane.org/gmane.linux.network/166687

What do people think?


-a



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmomUNJ23CHLLX2qryAuE2XQyBmo30du3MuRnobs%2BwEkguA>