Date: Tue, 15 Jul 2014 13:31:52 +0430 From: Hooman Fazaeli <hoomanfazaeli@gmail.com> To: Adrian Chadd <adrian@freebsd.org> Cc: FreeBSD Net <freebsd-net@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: UDP/TCP versus IP frames - subtle out of order packets with hardware hashing Message-ID: <53C4EE00.5090705@gmail.com> In-Reply-To: <CAJ-VmomUNJ23CHLLX2qryAuE2XQyBmo30du3MuRnobs%2BwEkguA@mail.gmail.com> References: <CAJ-VmomUNJ23CHLLX2qryAuE2XQyBmo30du3MuRnobs%2BwEkguA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 7/15/2014 5:14 AM, Adrian Chadd wrote: > Hi, > > Whilst digging into UDP receive side scaling on the intel ixgbe(4) > NIC, I stumbled across how it hashes traffic between IP fragmented > traffic and non IP-fragmented traffic. > > Here's how it surfaced: > > * the ixgbe(4) NIC is configured to hash on both IP (2-tuple) and > TCP/UDP (4-tuple); > * when a non-fragmented UDP frame comes in, it's hashed on the 4-tuple > and comes into queue A; > * when a fragmented UDP frame comes in, it's hashed on the IP 2-tuple > and comes into queue B. > > So if there's a mix of small and large datagrams, we'll end up with > some packets coming in via queue A and some by queue B. In normal > operation that'll result in out of order packets. > > For the RSS stuff I'm working on it means that some packets will match > the PCBGROUP setup and some won't. By default UDP configures a 2-tuple > hash so it expects packets to come in hashed appropriately. But that > only matches for large frames. For small frames it'll be hashed via > the 4-tuple and it won't match. > > The ip reassembly code doesn't recalculate the flowid/flowtype once > it's finished. It'd be nice to do that before further processing so it > can be placed in the right netisr. > > So there's a couple of semi-overlapping issues: > > * Right now we could get TCP and UDP frames out of order. I'd like to > at least have ixgbe(4) hash on the 2-tuple for UDP rather than the > 4-tuple. That fixes that silly corner case. It's not likely going to > show up except for things like forwarding workloads. Maybe people > doing memcached work, I'm not sure. > > * Whether or not to calculate the flowid/flowtype in ip_reass() (or > maybe in the netisr input path, in case there's no flowid assigned) so > work is better distributed; > > * .. then if we do that, we could do 4-tuple UDP hashing again and > we'd just recalculate for any large frames. > > Here's what happened with Linux and ixgbe in 2010 on this topic: > > http://comments.gmane.org/gmane.linux.network/166687 > > What do people think? > > > -a > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" Doesn't the problem applies to TCP too? TCP may be fragmented too but is less likely because of MSS. -- Best regards. Hooman Fazaeli
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53C4EE00.5090705>