From owner-freebsd-net@FreeBSD.ORG Tue Apr 2 06:16:42 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D8F9C377; Tue, 2 Apr 2013 06:16:42 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: from mail-lb0-f180.google.com (mail-lb0-f180.google.com [209.85.217.180]) by mx1.freebsd.org (Postfix) with ESMTP id 03607A41; Tue, 2 Apr 2013 06:16:41 +0000 (UTC) Received: by mail-lb0-f180.google.com with SMTP id t11so132688lbi.11 for ; Mon, 01 Apr 2013 23:16:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=XwIdg/PASth0kmGzCt9vNdSho6Y2PhU5OQHYZ6Y0ufc=; b=P5S3NVLAhqwOtrPyxYbwRDAOu4+Rb7LmABoL+4+fH0Sau2t0zcvAnvPLn/2xvN9J2K C24pO/gpSg8O9JOnqeEQXXnKRn6JvdMiFtFTlHSc47KFTd3KgYGIcjYX7AjlqUy5Gp47 AcNDoGtU89unCou08Dh3CXgxU4HFmjQrFuXfaaahAeU2XmmTKomjZOHluusexbL/hAhN Z3tIh/nMb+LK1zCn0HwCqec9zc6tLD0AaJ1W47dzupjt5ybhyEmmORrTFE5O2bfhMb+6 SrQk9Dxhv88AlLINB43YmifHH3FiKhAtyaaqZaENSTe0VaaRmWQ5ZPhD+2xlTvwEJvsd 0yeQ== MIME-Version: 1.0 X-Received: by 10.112.132.40 with SMTP id or8mr3788681lbb.119.1364883394617; Mon, 01 Apr 2013 23:16:34 -0700 (PDT) Received: by 10.114.48.102 with HTTP; Mon, 1 Apr 2013 23:16:34 -0700 (PDT) In-Reply-To: <51471974.3090300@freebsd.org> References: <5146121B.5080608@FreeBSD.org> <514649A5.4090200@freebsd.org> <3659B942-7C37-431F-8945-C8A5BCD8DC67@ipfw.ru> <51471974.3090300@freebsd.org> Date: Tue, 2 Apr 2013 14:16:34 +0800 Message-ID: Subject: Re: MPLS From: Sepherosa Ziehau To: Andre Oppermann Content-Type: text/plain; charset=ISO-8859-1 Cc: Sami Halabi , "Alexander V. Chernikov" , "Alexander V. Chernikov" , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Apr 2013 06:16:42 -0000 On Mon, Mar 18, 2013 at 9:41 PM, Andre Oppermann wrote: > On 18.03.2013 13:20, Alexander V. Chernikov wrote: >> >> On 17.03.2013, at 23:54, Andre Oppermann wrote: >> >>> On 17.03.2013 19:57, Alexander V. Chernikov wrote: >>>> >>>> On 17.03.2013 13:20, Sami Halabi wrote: >>>>>> >>>>>> ITOH OpenBSD has a complete implementation of MPLS out of the box, >>>>>> maybe >>>> >>>> Their control plane code is mostly useless due to design approach >>>> (routing daemons talk via kernel). >>> >>> >>> What's your approach? >> >> It is actually not mine. We have discussed this a bit in radix-related >> thread. Generally quagga/bird (and other hiperf hardware-accelerated and >> software routers) have feature-rich RIb from which best routes (possibly >> multipath) are installed to kernel/fib. Kernel main task should be to do >> efficient lookups while every other advanced feature should be implemented >> in userland. > > > Yes, we have started discussing it but haven't reached a conclusion among > the > two philosophies. We have also agreed that the current radix code is > horrible > in terms of cache misses per lookup. That however doesn't preclude an > agnostic > FIB+RIB approach. It's mostly a matter of structure layout to keep it > efficient. > > >>>> Their data plane code, well.. Yes, we can use some defines from their >>>> headers, but that's all :) >>>>>> >>>>>> porting it would be short and more straight forward than porting linux >>>>>> LDP >>>>>> implementation of BIRD. >>>> >>>> >>>> It is not 'linux' implementation. LDP itself is cross-platform. >>>> The most tricky place here is control plane. >>>> However, making _fast_ MPLS switching is tricky too, since it requires >>>> chages in our netisr/ethernet >>>> handling code. >>> >>> >>> Can you explain what changes you think are necessary and why? > >> >> >> We definitely need ability to dispatch chain of mbufs - this was already >> discussed in intel rx ring lock thread in -net. > > > Actually I'm not so convinced of that. Packet handling is a tradeoff > between > doing process-to-completion on each packet and doing context switches on > batches > of packets. > > Every few years the balance tilts forth and back between > process-to-completion > and batch processing. DragonFly went with a batch-lite token-passing > approach > throughout their kernel. It seems it didn't work out to the extent they > expected. > Now many parts are moving back to the more traditional locking approach. At least, the per-CPU netisr and other related per-CPU network stuffs (e.g. routing table) work quite well as we have _expected_ (the measured bi-directional IPv4 forwarding performance w/ fastforwarding is 5.6Mpps+, w/o fastforwarding 4.6Mpps+, w/ 4 igb(4) on i7-2600, using 90% cpu time on each HT in Dfly's polling(4) mode); it is _not_ using traditional locking approach on major network paths at all and for IPv4 forwarding Dfly is _not_ doing "process-to-completion". And as a side note: There was a paper compared the message-based parallelism TCP implementation, connection-based thread serialization TCP implementaion (Dfly is using) and connection-based lock serialization TCP implementation. The conclusion was connection-based thread serialization TCP implementation (Dfly is using) had too many scheduling cost. The paper's conclusion _no longer_ holds for Dfly nowadays; we have wiped out major scheduling cost on the hot TCP paths. So as far as I could see, its _not_ the problem of the model itself sometimes, but how the model should be implemented. Best Regards, sephe -- Tomorrow Will Never Die