Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Nov 2011 19:24:31 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Sami Halabi <sodynet1@gmail.com>
Cc:        freebsd-net <freebsd-net@freebsd.org>, David Hooton <david.hooton@platformnetworks.net>
Subject:   Re: MPD LAC Scaling
Message-ID:  <4EC14ECF.6050502@FreeBSD.org>
In-Reply-To: <CAEW%2BogYfnjiQsA1rqJBRfmysnyfn-CLj3WsG68GWJ7et_wjigw@mail.gmail.com>
References:  <4EBDA4B2.6030602@FreeBSD.org> <CAEW%2BogYfnjiQsA1rqJBRfmysnyfn-CLj3WsG68GWJ7et_wjigw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/14/11 19:16, Sami Halabi wrote:
> i wonder why not putting all your suggestions on MPD instead of making
> hacks, as I see it its necessary for performance,
> after all MPD is aimed to FBSD only, so its elementary to use ke

I don't understand your question. What hacks are you talking about? You
have some patches? Suggestions are still suggestions, because they
require some work. If somebody do it well, it should be added to MPD.

> 2011/11/12 Alexander Motin <mav@freebsd.org <mailto:mav@freebsd.org>>
>     > I'm currently evaluating MPD as a potential LAC solution for a
>     > project I'm working on.  I'm looking to try and handle at least 4Gbit
>     > and 20,000 sessions worth of PPPoE -> L2TP LAC traffic per server.
>     > The reading I've done from the archives so far seems to indicate that
>     > this has not yet been done.
> 
>     I also haven't heard about so big cases, but also can't say it is
>     theoretically impossible after some tuning and development work. At this
>     point I have neither production/test environment nor much time to
>     actively work on it, but I want to express some experience and ideas in
>     case somebody wants to take that.
> 
>     First, as Julian said, it is not necessary should be one server to
>     handle all load. Cluster of smaller machines should be preferable from
>     many points. PPPoE allows you to have several servers and load-balance
>     them. At this moment MPD can't balance load dynamically, but you can do
>     it manually, limiting number of sessions per one server.
> 
>     As some example point of hardware from personal experience I can say
>     that three years ago mpd5 on 1U servers with single Core2Duo CPUs, 1GB
>     of RAM and two 1Gb NICs (less then $1K that time) handled in production
>     about 2000 PPPoE sessions and 600Mbps of traffic per server, including
>     Netflow generation, per-customer typed traffic shaping and accounting.
>     Modern and more powerful hardware is able to do more.
> 
>     Getting higher numbers there can mostly be split in two questions:
>     getting more traffic and getting more sessions, as limitations are
>     different.
>      - Getting more traffic mostly means scaling kernel Netgraph and
>     networking code to more CPU cores. As soon as Netgraph uses direct
>     function calls when possible, it depends on number of network interrupt
>     threads in system. Three years ago there was only one net SWI thread and
>     setting net.isr.direct=1 while having several NICs in system allowed to
>     distribute load between CPUs. Modern high-level NICs with several MSI-X
>     interrupts should give the same effect. Now it is also possible to have
>     several net SWI threads, but I haven't tested it.
>      - Getting more sessions also means tuning and optimizing user-level mpd
>     daemon. Three years ago on Pentium4-level test machine I've reached
>     about 5K PPPoE sessions with RADIUS auth/acct. Main limiting factor was
>     user-level daemon performance. The more sessions connected, the more
>     overhead daemon had in face of LCP echo requests and event timeouts to
>     handle, number of netgraph kernel sockets to listen, etc. At some point
>     daemon is just unable to handle all new incoming events in time and
>     resending requests by clients causes cumulative effect. So the main
>     limiting factor is not just number of users, but also number of events.
>     If users connect one by one, number of sessions can be quite high. But
>     if due to some accident you have all users dropped and reconnecting,
>     that may cause overload sooner. In that case it is important even what
>     LCP echo timeout set on the server and clients, or how many logs are
>     enabled. My best tuning result that time on Pentium4-level machine was
>     about 100 connections per second. It allowed to setup 5000 simultaneous
>     sessions within 50 seconds. Higher numbers were problematic. At this
>     moment user-level MPD's main state machine is single-threaded, except
>     authorization and accounting (like RADIUS), that are done in separate
>     threads, but require synchronized completion to return the data.
>     Splitting main FPM on several threads is difficult, because it would
>     require to somehow to group links and bundles within different threads
>     with different locks, that is difficult, because of multilink support
>     and because until user is authorized, it is impossible to say which
>     bundle it should join. If there is need to handle several PPPoE services
>     with different names or several LAN segments, it theoretically may be
>     effective to have several MPD daemon instances running, one for each
>     service/segment. Generally I've spent less time on profiling and
>     optimizing MPD daemon itself then kernel code, so there still should be
>     a lot of space for improvement. Some possible optimization points I
>     still remember are:
>      - rework pevent() engine used by MPD state machine to use kqueue()
>     instead of poll() to reduce event overhead overhead;
>      - optimize locking of paction() functions used for thread creation and
>     completion for MPD-specific case; Idea was that by the cost of
>     functionality it could be simplified to reduce number of context
>     switches;
>      - rewrite RADIUS auth/acct support to run within main mpd thread or
>     fixed number of external threads; since existing threaded approach was
>     implemented, libradius got support for asynchronous operation; that
>     should reduce overhead for thread creation/destruction;
>      - optimize ng_ksocket node when work with large number of hooks, using
>     some optimized search, and/or make MPD to create another sockets for
>     each next number of links to balance kernel and user-level search
>     overheads; initially MPD created separate set of sockets for every link,
>     but it was found too expensive for user-level FSM and was rewritten into
>     present state with almost minimal number of sockets and most
>     multiplexing tone in kernel.
> 
>     I have no personal production experience with PPPoE-L2TP LAC case. It is
>     used much less often and I had only several reports from people actively
>     using it and no much numbers. I think LAC case should have smaller
>     overhead and CPU load and so better scalability then usual traffic
>     termination: there is no IPCP layer in PPP to negotiate, there is no
>     interfaces to create and configure, no Netflow, no shapes, no periodic
>     accounting, etc. If you don't need to authenticate users, but only to
>     forward connections, and so server doesn't need to handle LCP protocol,
>     task simplifies even much more.
> 
>     If you can setup test environment to stress-test the LAC stuffs, it
>     would be interesting to see the numbers. On my test lab I used several
>     machines with mpd configured for thousands of PPPoE client sessions each
>     to generate simultaneous connections. For testing LAC you also should
>     have some fast enough L2TP terminator. If you have no such hardware for
>     test, you may try use several systems with mpd L2TP servers spreading
>     load between them in one of ways to avoid bottleneck there, while system
>     load in such case may potentially slightly differ.

--
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EC14ECF.6050502>