From owner-freebsd-hackers@FreeBSD.ORG  Thu Aug 29 11:49:34 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id B9C08243;
 Thu, 29 Aug 2013 11:49:34 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com
 [IPv6:2a00:1450:400c:c05::234])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 566572FB7;
 Thu, 29 Aug 2013 11:49:33 +0000 (UTC)
Received: by mail-wi0-f180.google.com with SMTP id l12so352069wiv.13
 for <multiple recipients>; Thu, 29 Aug 2013 04:49:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=t5rRGR7p2lccT5dIVAFMMz5iEhB8EN3uCWH29jaGBug=;
 b=jG6EG9SHnUTL+LWD0mt8ZdlyhVyFrBZee9eO0wArZSQi7Kxa4sipEEiBbicH27NlRE
 WCFqBBALUkxOLkfAinjqMBlaV/iJhly1bozkC2JSX40PczqetRoSxgspp1/Uf8S+/7Y/
 SAPOMG5R/RfYBn/5LaIxPpziJpJ8uJvmxiuc1U90ViJZGA7R/XjoJgRyDWubRr53+sIM
 prBz7ivSPp48uUqSxvRc6u09Edy/XM3+hSFHKyMWPMoP/isaPhtr5W6IrGK0lz1Cm4Oc
 SNbYlKGLpk1MD4mzIVxrREDDjMFyu8VVJjqpBIM9jv8oxMF2TcPAFeMlBXnnKrl9XP9A
 PNgw==
MIME-Version: 1.0
X-Received: by 10.194.79.33 with SMTP id g1mr2141120wjx.79.1377776971643; Thu,
 29 Aug 2013 04:49:31 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.216.146.2 with HTTP; Thu, 29 Aug 2013 04:49:31 -0700 (PDT)
In-Reply-To: <521E41CB.30700@yandex-team.ru>
References: <521E41CB.30700@yandex-team.ru>
Date: Thu, 29 Aug 2013 04:49:31 -0700
X-Google-Sender-Auth: fjTZLF4GZ_Hxxlda_cdxncMN6aA
Message-ID: <CAJ-Vmo=N=HnZVCD41ZmDg2GwNnoa-tD0J0QLH80x=f7KA5d+Ug@mail.gmail.com>
Subject: Re: Network stack changes
From: Adrian Chadd <adrian@freebsd.org>
To: "Alexander V. Chernikov" <melifaro@yandex-team.ru>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: Luigi Rizzo <luigi@freebsd.org>, Andre Oppermann <andre@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 FreeBSD Net <net@freebsd.org>, "Andrey V. Elsukov" <ae@freebsd.org>,
 Gleb Smirnoff <glebius@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Aug 2013 11:49:34 -0000

Hi,

There's a lot of good stuff to review here, thanks!

Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to keep
locking things like that on a per-packet basis. We should be able to do
this in a cleaner way - we can defer RX into a CPU pinned taskqueue and
convert the interrupt handler to a fast handler that just schedules that
taskqueue. We can ignore the ithread entirely here.

What do you think?

Totally pie in the sky handwaving at this point:

* create an array of mbuf pointers for completed mbufs;
* populate the mbuf array;
* pass the array up to ether_demux().

For vlan handling, it may end up populating its own list of mbufs to push
up to ether_demux(). So maybe we should extend the API to have a bitmap of
packets to actually handle from the array, so we can pass up a larger array
of mbufs, note which ones are for the destination and then the upcall can
mark which frames its consumed.

I specifically wonder how much work/benefit we may see by doing:

* batching packets into lists so various steps can batch process things
rather than run to completion;
* batching the processing of a list of frames under a single lock instance
- eg, if the forwarding code could do the forwarding lookup for 'n' packets
under a single lock, then pass that list of frames up to inet_pfil_hook()
to do the work under one lock, etc, etc.

Here, the processing would look less like "grab lock and process to
completion" and more like "mark and sweep" - ie, we have a list of frames
that we mark as needing processing and mark as having been processed at
each layer, so we know where to next dispatch them.

I still have some tool coding to do with PMC before I even think about
tinkering with this as I'd like to measure stuff like per-packet latency as
well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P /
lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.)

Thanks,



-adrian