From owner-freebsd-net@FreeBSD.ORG Fri Apr 27 20:00:58 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4A781065674 for ; Fri, 27 Apr 2012 20:00:57 +0000 (UTC) (envelope-from juli@clockworksquid.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6D18C8FC21 for ; Fri, 27 Apr 2012 20:00:57 +0000 (UTC) Received: by weyt57 with SMTP id t57so842680wey.13 for ; Fri, 27 Apr 2012 13:00:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=r5WLLgJQaqG3IOTvnIEqTxBEUZh4CNx3ysWgMg5rCo8=; b=du8vPB5d6y42dQIhyGnF9eE/lgilX+aXMMhu+a9R+dMUOEiKvxg6cmuV19HeRXl7Bf Bu+fL0NIpHxTm9CRsyk+c+K9MO2TDFMMaB2hWjLL6bARxrZStZ2UUhqCXaBRg+F0ciGe nXrZ4I4t6Q0RVjPT/M/K2Bjc1MWZACphI6vSlTx2Mpt7yQ3ymsHpewrJ8eqcr6pmoKmt 1OvKm6gUn08uUMjWlp80ERvWSFR0vP+VSAMXooRkQT0BPds8qxAYPpaWjV9BoFXR3ILI pjC4ngWru7UE+cvmcSJ0mklDBe2eoYzQVS7WD83S3ATHaANxXlzNk7/SQeBvmL923o5u Ic3w== Received: by 10.180.94.161 with SMTP id dd1mr8898465wib.16.1335556856266; Fri, 27 Apr 2012 13:00:56 -0700 (PDT) MIME-Version: 1.0 Sender: juli@clockworksquid.com Received: by 10.216.110.67 with HTTP; Fri, 27 Apr 2012 13:00:36 -0700 (PDT) In-Reply-To: <1335554950.9324.3.camel@powernoodle-l7.corp.yahoo.com> References: <1335463643.2727.10.camel@powernoodle-l7.corp.yahoo.com> <1335554950.9324.3.camel@powernoodle-l7.corp.yahoo.com> From: Juli Mallett Date: Fri, 27 Apr 2012 13:00:36 -0700 X-Google-Sender-Auth: s0o9hV8toJVauQFd5Ol55uDF1eg Message-ID: To: Sean Bruno Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQn6RuPZfzY3IlRlZ2FUjhpq+F9FCRVhiGYQujT7Hg4eSjaFNdpW+IFKvzI2wb4DIeg2fhBY Cc: "freebsd-net@freebsd.org" Subject: Re: igb(4) at peak in big purple X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Apr 2012 20:00:58 -0000 On Fri, Apr 27, 2012 at 12:29, Sean Bruno wrote: > On Thu, 2012-04-26 at 11:13 -0700, Juli Mallett wrote: >> Queue splitting in Intel cards is done using a hash of protocol >> headers, so this is expected behavior. =C2=A0This also helps with TCP an= d >> UDP performance, in terms of keeping packets for the same protocol >> control block on the same core, but for other applications it's not >> ideal. =C2=A0If your application does not require that kind of locality, >> there are things that can be done in the driver to make it easier to >> balance packets between all queues about-evenly. > > Oh? :-) > > What should I be looking at to balance more evenly? Dirty hacks are involved :) I've sent some code to Luigi that I think would make sense in netmap (since for many tasks one's going to do with netmap, you want to use as many cores as possible, and maybe don't care about locality so much) but it could be useful in conjunction with the network stack, too, for tasks that don't need a lot of locality. Basically this is the deal: the Intel NICs hash of various header fields. Then, some bits from that hash are used to index a table. That table indicates what queue the received packet should go to. Ideally you'd want to use some sort of counter to index that table and get round-robin queue usage if you wanted to evenly saturate all cores. Unfortunately there doesn't seem to be a way to do that. What you can do, though, is regularly update the table that is indexed by hash. Very frequently, in fact, it's a pretty fast operation. So what I've done, for example, is to go through an rotate all of the entries every N packets, where N is something like the number of receive descriptors per queue divided by the number of queues. So bucket 0 goes to queue 0 and bucket 1 goes to queue 1 at first. Then a few hundred packets are received, and the table is reprogrammed, so now bucket 0 goes to queue 1 and bucket 1 goes to queue 0. I can provide code to do this, but I don't want to post it publicly (unless it is actually going to become an option for netmap) for fear that people will use it in scenarios where it's harmful and then complain. It's potentially one more painful variation for the Intel drivers that Intel can't support, and that just makes everyone miserable. Thanks, Juli.