Date: Mon, 14 Jul 2014 17:44:15 -0700 From: Adrian Chadd <adrian@freebsd.org> To: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, FreeBSD Net <freebsd-net@freebsd.org> Subject: UDP/TCP versus IP frames - subtle out of order packets with hardware hashing Message-ID: <CAJ-VmomUNJ23CHLLX2qryAuE2XQyBmo30du3MuRnobs%2BwEkguA@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, Whilst digging into UDP receive side scaling on the intel ixgbe(4) NIC, I stumbled across how it hashes traffic between IP fragmented traffic and non IP-fragmented traffic. Here's how it surfaced: * the ixgbe(4) NIC is configured to hash on both IP (2-tuple) and TCP/UDP (4-tuple); * when a non-fragmented UDP frame comes in, it's hashed on the 4-tuple and comes into queue A; * when a fragmented UDP frame comes in, it's hashed on the IP 2-tuple and comes into queue B. So if there's a mix of small and large datagrams, we'll end up with some packets coming in via queue A and some by queue B. In normal operation that'll result in out of order packets. For the RSS stuff I'm working on it means that some packets will match the PCBGROUP setup and some won't. By default UDP configures a 2-tuple hash so it expects packets to come in hashed appropriately. But that only matches for large frames. For small frames it'll be hashed via the 4-tuple and it won't match. The ip reassembly code doesn't recalculate the flowid/flowtype once it's finished. It'd be nice to do that before further processing so it can be placed in the right netisr. So there's a couple of semi-overlapping issues: * Right now we could get TCP and UDP frames out of order. I'd like to at least have ixgbe(4) hash on the 2-tuple for UDP rather than the 4-tuple. That fixes that silly corner case. It's not likely going to show up except for things like forwarding workloads. Maybe people doing memcached work, I'm not sure. * Whether or not to calculate the flowid/flowtype in ip_reass() (or maybe in the netisr input path, in case there's no flowid assigned) so work is better distributed; * .. then if we do that, we could do 4-tuple UDP hashing again and we'd just recalculate for any large frames. Here's what happened with Linux and ixgbe in 2010 on this topic: http://comments.gmane.org/gmane.linux.network/166687 What do people think? -a
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmomUNJ23CHLLX2qryAuE2XQyBmo30du3MuRnobs%2BwEkguA>