From owner-freebsd-net@FreeBSD.ORG  Fri Sep 13 22:44:58 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id BA466CF7;
 Fri, 13 Sep 2013 22:44:58 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id F0B502CB0;
 Fri, 13 Sep 2013 22:44:57 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqEEAMaUM1KDaFve/2dsb2JhbABbhBGDKr1RgTN0giUBAQQBDhVCFBsYAgINGQJZBhOHfQanYJFpgSmOFDQHgmmBNQOpboNAIIFu
X-IronPort-AV: E=Sophos;i="4.90,901,1371096000"; d="scan'208";a="51784977"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 13 Sep 2013 18:43:23 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 11462B3F45;
 Fri, 13 Sep 2013 18:43:23 -0400 (EDT)
Date: Fri, 13 Sep 2013 18:43:23 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: George Neville-Neil <gnn@neville-neil.com>
Message-ID: <221093226.23439826.1379112203059.JavaMail.root@uoguelph.ca>
In-Reply-To: <6BDA4619-783C-433E-9819-A7EAA0BD3299@neville-neil.com>
Subject: Re: Network stack changes
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
X-Mailman-Approved-At: Fri, 13 Sep 2013 23:03:44 +0000
Cc: "Alexander V. Chernikov" <melifaro@yandex-team.ru>,
 Luigi Rizzo <luigi@freebsd.org>, Andre Oppermann <andre@freebsd.org>,
 freebsd-hackers@freebsd.org, FreeBSD Net <net@freebsd.org>,
 Adrian Chadd <adrian@freebsd.org>, "Andrey V. Elsukov" <ae@freebsd.org>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Sep 2013 22:44:58 -0000

George Neville-Neil wrote:
> 
> On Aug 29, 2013, at 7:49 , Adrian Chadd <adrian@freebsd.org> wrote:
> 
> > Hi,
> > 
> > There's a lot of good stuff to review here, thanks!
> > 
> > Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless
> > to keep
> > locking things like that on a per-packet basis. We should be able
> > to do
> > this in a cleaner way - we can defer RX into a CPU pinned taskqueue
> > and
> > convert the interrupt handler to a fast handler that just schedules
> > that
> > taskqueue. We can ignore the ithread entirely here.
> > 
> > What do you think?
> > 
> > Totally pie in the sky handwaving at this point:
> > 
> > * create an array of mbuf pointers for completed mbufs;
> > * populate the mbuf array;
> > * pass the array up to ether_demux().
> > 
> > For vlan handling, it may end up populating its own list of mbufs
> > to push
> > up to ether_demux(). So maybe we should extend the API to have a
> > bitmap of
> > packets to actually handle from the array, so we can pass up a
> > larger array
> > of mbufs, note which ones are for the destination and then the
> > upcall can
> > mark which frames its consumed.
> > 
> > I specifically wonder how much work/benefit we may see by doing:
> > 
> > * batching packets into lists so various steps can batch process
> > things
> > rather than run to completion;
> > * batching the processing of a list of frames under a single lock
> > instance
> > - eg, if the forwarding code could do the forwarding lookup for 'n'
> > packets
> > under a single lock, then pass that list of frames up to
> > inet_pfil_hook()
> > to do the work under one lock, etc, etc.
> > 
> > Here, the processing would look less like "grab lock and process to
> > completion" and more like "mark and sweep" - ie, we have a list of
> > frames
> > that we mark as needing processing and mark as having been
> > processed at
> > each layer, so we know where to next dispatch them.
> > 
> 
> One quick note here.  Every time you increase batching you may
> increase bandwidth
> but you will also increase per packet latency for the last packet in
> a batch.
> That is fine so long as we remember that and that this is a tuning
> knob
> to balance the two.
> 
And any time you increase latency, that will have a negative impact on
NFS performance. NFS RPCs are usually small messages (except Write requests
and Read replies) and the RTT for these (mostly small, bidirectional)
messages can have a significant impact on NFS perf.

rick

> > I still have some tool coding to do with PMC before I even think
> > about
> > tinkering with this as I'd like to measure stuff like per-packet
> > latency as
> > well as top-level processing overhead (ie,
> > CPU_CLK_UNHALTED.THREAD_P /
> > lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core,
> > etc.)
> > 
> 
> This would be very useful in identifying the actual hot spots, and
> would be helpful
> to anyone who can generate a decent stream of packets with, say, an
> IXIA.
> 
> Best,
> George
> 
> 
>