From owner-freebsd-net@FreeBSD.ORG  Mon Nov 14 17:16:59 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 33743106564A;
	Mon, 14 Nov 2011 17:16:59 +0000 (UTC)
	(envelope-from sodynet1@gmail.com)
Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com
	[209.85.210.182])
	by mx1.freebsd.org (Postfix) with ESMTP id E412E8FC14;
	Mon, 14 Nov 2011 17:16:58 +0000 (UTC)
Received: by iakl21 with SMTP id l21so8796747iak.13
	for <multiple recipients>; Mon, 14 Nov 2011 09:16:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=Dv7M+CqQxKvN1c31QDjV0ko7A1kqeazyooGeF7oXdvY=;
	b=cLmqVJ6qliikgY5/at8JCrlH49Tjj3S387iwh2H5nYOQSKK4pfoX4plbijz4s5lcp7
	z72qzG9mqihiwkk1Jkp64u5LtSAaEsejV+/2PVxjhjRvNOor1HhuBgWL65AoHjBcKaX/
	ZqWDm/NN3IS2udIcDl/Es8uZ5ihsWti9MFlU8=
MIME-Version: 1.0
Received: by 10.231.9.40 with SMTP id j40mr5489481ibj.59.1321291018460; Mon,
	14 Nov 2011 09:16:58 -0800 (PST)
Received: by 10.231.32.12 with HTTP; Mon, 14 Nov 2011 09:16:58 -0800 (PST)
In-Reply-To: <4EBDA4B2.6030602@FreeBSD.org>
References: <4EBDA4B2.6030602@FreeBSD.org>
Date: Mon, 14 Nov 2011 19:16:58 +0200
Message-ID: <CAEW+ogYfnjiQsA1rqJBRfmysnyfn-CLj3WsG68GWJ7et_wjigw@mail.gmail.com>
From: Sami Halabi <sodynet1@gmail.com>
To: Alexander Motin <mav@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-net <freebsd-net@freebsd.org>,
	David Hooton <david.hooton@platformnetworks.net>
Subject: Re: MPD LAC Scaling
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Nov 2011 17:16:59 -0000

Hi,
i wonder why not putting all your suggestions on MPD instead of making
hacks, as I see it its necessary for performance,
after all MPD is aimed to FBSD only, so its elementary to use ke

Sami

2011/11/12 Alexander Motin <mav@freebsd.org>

> Hi.
>
> > I'm currently evaluating MPD as a potential LAC solution for a
> > project I'm working on.  I'm looking to try and handle at least 4Gbit
> > and 20,000 sessions worth of PPPoE -> L2TP LAC traffic per server.
> > The reading I've done from the archives so far seems to indicate that
> > this has not yet been done.
>
> I also haven't heard about so big cases, but also can't say it is
> theoretically impossible after some tuning and development work. At this
> point I have neither production/test environment nor much time to
> actively work on it, but I want to express some experience and ideas in
> case somebody wants to take that.
>
> First, as Julian said, it is not necessary should be one server to
> handle all load. Cluster of smaller machines should be preferable from
> many points. PPPoE allows you to have several servers and load-balance
> them. At this moment MPD can't balance load dynamically, but you can do
> it manually, limiting number of sessions per one server.
>
> As some example point of hardware from personal experience I can say
> that three years ago mpd5 on 1U servers with single Core2Duo CPUs, 1GB
> of RAM and two 1Gb NICs (less then $1K that time) handled in production
> about 2000 PPPoE sessions and 600Mbps of traffic per server, including
> Netflow generation, per-customer typed traffic shaping and accounting.
> Modern and more powerful hardware is able to do more.
>
> Getting higher numbers there can mostly be split in two questions:
> getting more traffic and getting more sessions, as limitations are
> different.
>  - Getting more traffic mostly means scaling kernel Netgraph and
> networking code to more CPU cores. As soon as Netgraph uses direct
> function calls when possible, it depends on number of network interrupt
> threads in system. Three years ago there was only one net SWI thread and
> setting net.isr.direct=1 while having several NICs in system allowed to
> distribute load between CPUs. Modern high-level NICs with several MSI-X
> interrupts should give the same effect. Now it is also possible to have
> several net SWI threads, but I haven't tested it.
>  - Getting more sessions also means tuning and optimizing user-level mpd
> daemon. Three years ago on Pentium4-level test machine I've reached
> about 5K PPPoE sessions with RADIUS auth/acct. Main limiting factor was
> user-level daemon performance. The more sessions connected, the more
> overhead daemon had in face of LCP echo requests and event timeouts to
> handle, number of netgraph kernel sockets to listen, etc. At some point
> daemon is just unable to handle all new incoming events in time and
> resending requests by clients causes cumulative effect. So the main
> limiting factor is not just number of users, but also number of events.
> If users connect one by one, number of sessions can be quite high. But
> if due to some accident you have all users dropped and reconnecting,
> that may cause overload sooner. In that case it is important even what
> LCP echo timeout set on the server and clients, or how many logs are
> enabled. My best tuning result that time on Pentium4-level machine was
> about 100 connections per second. It allowed to setup 5000 simultaneous
> sessions within 50 seconds. Higher numbers were problematic. At this
> moment user-level MPD's main state machine is single-threaded, except
> authorization and accounting (like RADIUS), that are done in separate
> threads, but require synchronized completion to return the data.
> Splitting main FPM on several threads is difficult, because it would
> require to somehow to group links and bundles within different threads
> with different locks, that is difficult, because of multilink support
> and because until user is authorized, it is impossible to say which
> bundle it should join. If there is need to handle several PPPoE services
> with different names or several LAN segments, it theoretically may be
> effective to have several MPD daemon instances running, one for each
> service/segment. Generally I've spent less time on profiling and
> optimizing MPD daemon itself then kernel code, so there still should be
> a lot of space for improvement. Some possible optimization points I
> still remember are:
>  - rework pevent() engine used by MPD state machine to use kqueue()
> instead of poll() to reduce event overhead overhead;
>  - optimize locking of paction() functions used for thread creation and
> completion for MPD-specific case; Idea was that by the cost of
> functionality it could be simplified to reduce number of context switches;
>  - rewrite RADIUS auth/acct support to run within main mpd thread or
> fixed number of external threads; since existing threaded approach was
> implemented, libradius got support for asynchronous operation; that
> should reduce overhead for thread creation/destruction;
>  - optimize ng_ksocket node when work with large number of hooks, using
> some optimized search, and/or make MPD to create another sockets for
> each next number of links to balance kernel and user-level search
> overheads; initially MPD created separate set of sockets for every link,
> but it was found too expensive for user-level FSM and was rewritten into
> present state with almost minimal number of sockets and most
> multiplexing tone in kernel.
>
> I have no personal production experience with PPPoE-L2TP LAC case. It is
> used much less often and I had only several reports from people actively
> using it and no much numbers. I think LAC case should have smaller
> overhead and CPU load and so better scalability then usual traffic
> termination: there is no IPCP layer in PPP to negotiate, there is no
> interfaces to create and configure, no Netflow, no shapes, no periodic
> accounting, etc. If you don't need to authenticate users, but only to
> forward connections, and so server doesn't need to handle LCP protocol,
> task simplifies even much more.
>
> If you can setup test environment to stress-test the LAC stuffs, it
> would be interesting to see the numbers. On my test lab I used several
> machines with mpd configured for thousands of PPPoE client sessions each
> to generate simultaneous connections. For testing LAC you also should
> have some fast enough L2TP terminator. If you have no such hardware for
> test, you may try use several systems with mpd L2TP servers spreading
> load between them in one of ways to avoid bottleneck there, while system
> load in such case may potentially slightly differ.
>
> --
> Alexander Motin
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>



-- 
Sami Halabi
Information Systems Engineer
NMS Projects Expert