From owner-freebsd-net@FreeBSD.ORG  Tue Apr  2 06:16:42 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D8F9C377;
 Tue,  2 Apr 2013 06:16:42 +0000 (UTC)
 (envelope-from sepherosa@gmail.com)
Received: from mail-lb0-f180.google.com (mail-lb0-f180.google.com
 [209.85.217.180])
 by mx1.freebsd.org (Postfix) with ESMTP id 03607A41;
 Tue,  2 Apr 2013 06:16:41 +0000 (UTC)
Received: by mail-lb0-f180.google.com with SMTP id t11so132688lbi.11
 for <multiple recipients>; Mon, 01 Apr 2013 23:16:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=XwIdg/PASth0kmGzCt9vNdSho6Y2PhU5OQHYZ6Y0ufc=;
 b=P5S3NVLAhqwOtrPyxYbwRDAOu4+Rb7LmABoL+4+fH0Sau2t0zcvAnvPLn/2xvN9J2K
 C24pO/gpSg8O9JOnqeEQXXnKRn6JvdMiFtFTlHSc47KFTd3KgYGIcjYX7AjlqUy5Gp47
 AcNDoGtU89unCou08Dh3CXgxU4HFmjQrFuXfaaahAeU2XmmTKomjZOHluusexbL/hAhN
 Z3tIh/nMb+LK1zCn0HwCqec9zc6tLD0AaJ1W47dzupjt5ybhyEmmORrTFE5O2bfhMb+6
 SrQk9Dxhv88AlLINB43YmifHH3FiKhAtyaaqZaENSTe0VaaRmWQ5ZPhD+2xlTvwEJvsd
 0yeQ==
MIME-Version: 1.0
X-Received: by 10.112.132.40 with SMTP id or8mr3788681lbb.119.1364883394617;
 Mon, 01 Apr 2013 23:16:34 -0700 (PDT)
Received: by 10.114.48.102 with HTTP; Mon, 1 Apr 2013 23:16:34 -0700 (PDT)
In-Reply-To: <51471974.3090300@freebsd.org>
References: <CAEW+ogb_b6fYLvcEJdhzRnoyjr0ORto9iNyJ-iiNfniBRnPxmA@mail.gmail.com>
 <CAEW+ogZTE4Uw-0ROEoSex=VtC+0tChupE2RAW5RFOn=OQEuLLw@mail.gmail.com>
 <CAEW+ogYbCkCfbFHT0t2v-VmqUkXLGVHgAHPET3X5c2DnsT=Enw@mail.gmail.com>
 <5146121B.5080608@FreeBSD.org> <514649A5.4090200@freebsd.org>
 <3659B942-7C37-431F-8945-C8A5BCD8DC67@ipfw.ru>
 <51471974.3090300@freebsd.org>
Date: Tue, 2 Apr 2013 14:16:34 +0800
Message-ID: <CAMOc5czL9V6LH+xD6OXTA0y6Nc=wMdfiPn_ssANx7yBYHHSDSA@mail.gmail.com>
Subject: Re: MPLS
From: Sepherosa Ziehau <sepherosa@gmail.com>
To: Andre Oppermann <andre@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Sami Halabi <sodynet1@gmail.com>,
 "Alexander V. Chernikov" <melifaro@freebsd.org>,
 "Alexander V. Chernikov" <melifaro@ipfw.ru>,
 "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Apr 2013 06:16:42 -0000

On Mon, Mar 18, 2013 at 9:41 PM, Andre Oppermann <andre@freebsd.org> wrote:
> On 18.03.2013 13:20, Alexander V. Chernikov wrote:
>>
>> On 17.03.2013, at 23:54, Andre Oppermann <andre@freebsd.org> wrote:
>>
>>> On 17.03.2013 19:57, Alexander V. Chernikov wrote:
>>>>
>>>> On 17.03.2013 13:20, Sami Halabi wrote:
>>>>>>
>>>>>> ITOH OpenBSD has a complete implementation of MPLS out of the box,
>>>>>> maybe
>>>>
>>>> Their control plane code is mostly useless due to design approach
>>>> (routing daemons talk via kernel).
>>>
>>>
>>> What's your approach?
>>
>> It is actually not mine. We have discussed this a bit in radix-related
>> thread. Generally quagga/bird (and other hiperf hardware-accelerated and
>> software routers) have feature-rich RIb from which best routes (possibly
>> multipath) are installed to kernel/fib. Kernel main task should be to do
>> efficient lookups while every other advanced feature should be implemented
>> in userland.
>
>
> Yes, we have started discussing it but haven't reached a conclusion among
> the
> two philosophies.  We have also agreed that the current radix code is
> horrible
> in terms of cache misses per lookup.  That however doesn't preclude an
> agnostic
> FIB+RIB approach.  It's mostly a matter of structure layout to keep it
> efficient.
>
>
>>>> Their data plane code, well.. Yes, we can use some defines from their
>>>> headers, but that's all :)
>>>>>>
>>>>>> porting it would be short and more straight forward than porting linux
>>>>>> LDP
>>>>>> implementation of BIRD.
>>>>
>>>>
>>>> It is not 'linux' implementation. LDP itself is cross-platform.
>>>> The most tricky place here is control plane.
>>>> However, making _fast_ MPLS switching is tricky too, since it requires
>>>> chages in our netisr/ethernet
>>>> handling code.
>>>
>>>
>>> Can you explain what changes you think are necessary and why?
>
>>
>>
>> We definitely need ability to dispatch chain of mbufs - this was already
>> discussed in intel rx ring lock thread in -net.
>
>
> Actually I'm not so convinced of that.  Packet handling is a tradeoff
> between
> doing process-to-completion on each packet and doing context switches on
> batches
> of packets.
>
> Every few years the balance tilts forth and back between
> process-to-completion
> and batch processing.  DragonFly went with a batch-lite token-passing
> approach
> throughout their kernel.  It seems it didn't work out to the extent they
> expected.
> Now many parts are moving back to the more traditional locking approach.

At least, the per-CPU netisr and other related per-CPU network stuffs
(e.g. routing table) work quite well as we have _expected_ (the
measured bi-directional IPv4 forwarding performance w/ fastforwarding
is 5.6Mpps+, w/o fastforwarding 4.6Mpps+, w/ 4 igb(4) on i7-2600,
using 90% cpu time on each HT in Dfly's polling(4) mode); it is _not_
using traditional locking approach on major network paths at all and
for IPv4 forwarding Dfly is _not_ doing "process-to-completion".

And as a side note: There was a paper compared the message-based
parallelism TCP implementation, connection-based thread serialization
TCP implementaion (Dfly is using) and connection-based lock
serialization TCP implementation.  The conclusion was connection-based
thread serialization TCP implementation (Dfly is using) had too many
scheduling cost.  The paper's conclusion _no longer_ holds for Dfly
nowadays; we have wiped out major scheduling cost on the hot TCP
paths.  So as far as I could see, its _not_ the problem of the model
itself sometimes, but how the model should be implemented.

Best Regards,
sephe

--
Tomorrow Will Never Die