From owner-freebsd-net@FreeBSD.ORG  Fri Apr 27 20:00:58 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E4A781065674
	for <freebsd-net@freebsd.org>; Fri, 27 Apr 2012 20:00:57 +0000 (UTC)
	(envelope-from juli@clockworksquid.com)
Received: from mail-we0-f182.google.com (mail-we0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 6D18C8FC21
	for <freebsd-net@freebsd.org>; Fri, 27 Apr 2012 20:00:57 +0000 (UTC)
Received: by weyt57 with SMTP id t57so842680wey.13
	for <freebsd-net@freebsd.org>; Fri, 27 Apr 2012 13:00:56 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:sender:in-reply-to:references:from:date
	:x-google-sender-auth:message-id:subject:to:cc:content-type
	:content-transfer-encoding:x-gm-message-state;
	bh=r5WLLgJQaqG3IOTvnIEqTxBEUZh4CNx3ysWgMg5rCo8=;
	b=du8vPB5d6y42dQIhyGnF9eE/lgilX+aXMMhu+a9R+dMUOEiKvxg6cmuV19HeRXl7Bf
	Bu+fL0NIpHxTm9CRsyk+c+K9MO2TDFMMaB2hWjLL6bARxrZStZ2UUhqCXaBRg+F0ciGe
	nXrZ4I4t6Q0RVjPT/M/K2Bjc1MWZACphI6vSlTx2Mpt7yQ3ymsHpewrJ8eqcr6pmoKmt
	1OvKm6gUn08uUMjWlp80ERvWSFR0vP+VSAMXooRkQT0BPds8qxAYPpaWjV9BoFXR3ILI
	pjC4ngWru7UE+cvmcSJ0mklDBe2eoYzQVS7WD83S3ATHaANxXlzNk7/SQeBvmL923o5u
	Ic3w==
Received: by 10.180.94.161 with SMTP id dd1mr8898465wib.16.1335556856266; Fri,
	27 Apr 2012 13:00:56 -0700 (PDT)
MIME-Version: 1.0
Sender: juli@clockworksquid.com
Received: by 10.216.110.67 with HTTP; Fri, 27 Apr 2012 13:00:36 -0700 (PDT)
In-Reply-To: <1335554950.9324.3.camel@powernoodle-l7.corp.yahoo.com>
References: <1335463643.2727.10.camel@powernoodle-l7.corp.yahoo.com>
	<CACVs6=8B8fUX-csyZM+GbhQHPn70qxjO6va++xtWDAiKWHywXQ@mail.gmail.com>
	<1335554950.9324.3.camel@powernoodle-l7.corp.yahoo.com>
From: Juli Mallett <jmallett@FreeBSD.org>
Date: Fri, 27 Apr 2012 13:00:36 -0700
X-Google-Sender-Auth: s0o9hV8toJVauQFd5Ol55uDF1eg
Message-ID: <CACVs6=9RzaZAHx6RC4AGywTzpuc8hNrY4eD-e-AJoV32OEMVgg@mail.gmail.com>
To: Sean Bruno <seanbru@yahoo-inc.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State: ALoCoQn6RuPZfzY3IlRlZ2FUjhpq+F9FCRVhiGYQujT7Hg4eSjaFNdpW+IFKvzI2wb4DIeg2fhBY
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject: Re: igb(4) at peak in big purple
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Apr 2012 20:00:58 -0000

On Fri, Apr 27, 2012 at 12:29, Sean Bruno <seanbru@yahoo-inc.com> wrote:
> On Thu, 2012-04-26 at 11:13 -0700, Juli Mallett wrote:
>> Queue splitting in Intel cards is done using a hash of protocol
>> headers, so this is expected behavior. =C2=A0This also helps with TCP an=
d
>> UDP performance, in terms of keeping packets for the same protocol
>> control block on the same core, but for other applications it's not
>> ideal. =C2=A0If your application does not require that kind of locality,
>> there are things that can be done in the driver to make it easier to
>> balance packets between all queues about-evenly.
>
> Oh? :-)
>
> What should I be looking at to balance more evenly?

Dirty hacks are involved :)  I've sent some code to Luigi that I think
would make sense in netmap (since for many tasks one's going to do
with netmap, you want to use as many cores as possible, and maybe
don't care about locality so much) but it could be useful in
conjunction with the network stack, too, for tasks that don't need a
lot of locality.

Basically this is the deal: the Intel NICs hash of various header
fields.  Then, some bits from that hash are used to index a table.
That table indicates what queue the received packet should go to.
Ideally you'd want to use some sort of counter to index that table and
get round-robin queue usage if you wanted to evenly saturate all
cores.  Unfortunately there doesn't seem to be a way to do that.

What you can do, though, is regularly update the table that is indexed
by hash.  Very frequently, in fact, it's a pretty fast operation.  So
what I've done, for example, is to go through an rotate all of the
entries every N packets, where N is something like the number of
receive descriptors per queue divided by the number of queues.  So
bucket 0 goes to queue 0 and bucket 1 goes to queue 1 at first.  Then
a few hundred packets are received, and the table is reprogrammed, so
now bucket 0 goes to queue 1 and bucket 1 goes to queue 0.

I can provide code to do this, but I don't want to post it publicly
(unless it is actually going to become an option for netmap) for fear
that people will use it in scenarios where it's harmful and then
complain.  It's potentially one more painful variation for the Intel
drivers that Intel can't support, and that just makes everyone
miserable.

Thanks,
Juli.