From owner-freebsd-net@FreeBSD.ORG Tue Jul 15 09:02:50 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E7293184; Tue, 15 Jul 2014 09:02:49 +0000 (UTC) Received: from mail-la0-x234.google.com (mail-la0-x234.google.com [IPv6:2a00:1450:4010:c03::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 163962E50; Tue, 15 Jul 2014 09:02:48 +0000 (UTC) Received: by mail-la0-f52.google.com with SMTP id e16so2191931lan.39 for ; Tue, 15 Jul 2014 02:02:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=n/4LTRxM71tw09Lb29UdD1Z/Zt09C9FQuAsVj1pSy8M=; b=rQKbDcAOrqMyjPb38vnOTiwSEzbGk97D9z+Og412gx+R7VnwWT9AvKStNL/7fAvfZ+ eDG3g9pC/tg2EtEXUHx0MawNpOqRu+jVrIzNjAUUruwIDtBqK+i18wJd5Y7CekqDUZH5 tYR1OxU/NpdtKVqZ1VTid4qWPtE9zrnLKouF9JIXPzMk6z5XYSlHSkWx/JBMFlHt9uUW AK0ntAamS0a0P2J14XTx+3/2TDVfOCdt0B2PHcUpebwEcgr1NCyhav7foHqyqiGtfc3y F3IxXxeHoFnDtb8HW4z41AXlTV8w37qzoBXUJjVvpPqJ9/touZfg+ztrQC/jsOwijyDP aD9g== X-Received: by 10.152.205.99 with SMTP id lf3mr1460665lac.63.1405414966866; Tue, 15 Jul 2014 02:02:46 -0700 (PDT) Received: from [192.168.2.30] ([2.176.190.226]) by mx.google.com with ESMTPSA id qx6sm19344336lbb.23.2014.07.15.02.01.48 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 15 Jul 2014 02:02:46 -0700 (PDT) Message-ID: <53C4EE00.5090705@gmail.com> Date: Tue, 15 Jul 2014 13:31:52 +0430 From: Hooman Fazaeli User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: UDP/TCP versus IP frames - subtle out of order packets with hardware hashing References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jul 2014 09:02:50 -0000 On 7/15/2014 5:14 AM, Adrian Chadd wrote: > Hi, > > Whilst digging into UDP receive side scaling on the intel ixgbe(4) > NIC, I stumbled across how it hashes traffic between IP fragmented > traffic and non IP-fragmented traffic. > > Here's how it surfaced: > > * the ixgbe(4) NIC is configured to hash on both IP (2-tuple) and > TCP/UDP (4-tuple); > * when a non-fragmented UDP frame comes in, it's hashed on the 4-tuple > and comes into queue A; > * when a fragmented UDP frame comes in, it's hashed on the IP 2-tuple > and comes into queue B. > > So if there's a mix of small and large datagrams, we'll end up with > some packets coming in via queue A and some by queue B. In normal > operation that'll result in out of order packets. > > For the RSS stuff I'm working on it means that some packets will match > the PCBGROUP setup and some won't. By default UDP configures a 2-tuple > hash so it expects packets to come in hashed appropriately. But that > only matches for large frames. For small frames it'll be hashed via > the 4-tuple and it won't match. > > The ip reassembly code doesn't recalculate the flowid/flowtype once > it's finished. It'd be nice to do that before further processing so it > can be placed in the right netisr. > > So there's a couple of semi-overlapping issues: > > * Right now we could get TCP and UDP frames out of order. I'd like to > at least have ixgbe(4) hash on the 2-tuple for UDP rather than the > 4-tuple. That fixes that silly corner case. It's not likely going to > show up except for things like forwarding workloads. Maybe people > doing memcached work, I'm not sure. > > * Whether or not to calculate the flowid/flowtype in ip_reass() (or > maybe in the netisr input path, in case there's no flowid assigned) so > work is better distributed; > > * .. then if we do that, we could do 4-tuple UDP hashing again and > we'd just recalculate for any large frames. > > Here's what happened with Linux and ixgbe in 2010 on this topic: > > http://comments.gmane.org/gmane.linux.network/166687 > > What do people think? > > > -a > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" Doesn't the problem applies to TCP too? TCP may be fragmented too but is less likely because of MSS. -- Best regards. Hooman Fazaeli