Date: Mon, 14 Nov 2011 19:16:58 +0200 From: Sami Halabi <sodynet1@gmail.com> To: Alexander Motin <mav@freebsd.org> Cc: freebsd-net <freebsd-net@freebsd.org>, David Hooton <david.hooton@platformnetworks.net> Subject: Re: MPD LAC Scaling Message-ID: <CAEW%2BogYfnjiQsA1rqJBRfmysnyfn-CLj3WsG68GWJ7et_wjigw@mail.gmail.com> In-Reply-To: <4EBDA4B2.6030602@FreeBSD.org> References: <4EBDA4B2.6030602@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, i wonder why not putting all your suggestions on MPD instead of making hacks, as I see it its necessary for performance, after all MPD is aimed to FBSD only, so its elementary to use ke Sami 2011/11/12 Alexander Motin <mav@freebsd.org> > Hi. > > > I'm currently evaluating MPD as a potential LAC solution for a > > project I'm working on. I'm looking to try and handle at least 4Gbit > > and 20,000 sessions worth of PPPoE -> L2TP LAC traffic per server. > > The reading I've done from the archives so far seems to indicate that > > this has not yet been done. > > I also haven't heard about so big cases, but also can't say it is > theoretically impossible after some tuning and development work. At this > point I have neither production/test environment nor much time to > actively work on it, but I want to express some experience and ideas in > case somebody wants to take that. > > First, as Julian said, it is not necessary should be one server to > handle all load. Cluster of smaller machines should be preferable from > many points. PPPoE allows you to have several servers and load-balance > them. At this moment MPD can't balance load dynamically, but you can do > it manually, limiting number of sessions per one server. > > As some example point of hardware from personal experience I can say > that three years ago mpd5 on 1U servers with single Core2Duo CPUs, 1GB > of RAM and two 1Gb NICs (less then $1K that time) handled in production > about 2000 PPPoE sessions and 600Mbps of traffic per server, including > Netflow generation, per-customer typed traffic shaping and accounting. > Modern and more powerful hardware is able to do more. > > Getting higher numbers there can mostly be split in two questions: > getting more traffic and getting more sessions, as limitations are > different. > - Getting more traffic mostly means scaling kernel Netgraph and > networking code to more CPU cores. As soon as Netgraph uses direct > function calls when possible, it depends on number of network interrupt > threads in system. Three years ago there was only one net SWI thread and > setting net.isr.direct=1 while having several NICs in system allowed to > distribute load between CPUs. Modern high-level NICs with several MSI-X > interrupts should give the same effect. Now it is also possible to have > several net SWI threads, but I haven't tested it. > - Getting more sessions also means tuning and optimizing user-level mpd > daemon. Three years ago on Pentium4-level test machine I've reached > about 5K PPPoE sessions with RADIUS auth/acct. Main limiting factor was > user-level daemon performance. The more sessions connected, the more > overhead daemon had in face of LCP echo requests and event timeouts to > handle, number of netgraph kernel sockets to listen, etc. At some point > daemon is just unable to handle all new incoming events in time and > resending requests by clients causes cumulative effect. So the main > limiting factor is not just number of users, but also number of events. > If users connect one by one, number of sessions can be quite high. But > if due to some accident you have all users dropped and reconnecting, > that may cause overload sooner. In that case it is important even what > LCP echo timeout set on the server and clients, or how many logs are > enabled. My best tuning result that time on Pentium4-level machine was > about 100 connections per second. It allowed to setup 5000 simultaneous > sessions within 50 seconds. Higher numbers were problematic. At this > moment user-level MPD's main state machine is single-threaded, except > authorization and accounting (like RADIUS), that are done in separate > threads, but require synchronized completion to return the data. > Splitting main FPM on several threads is difficult, because it would > require to somehow to group links and bundles within different threads > with different locks, that is difficult, because of multilink support > and because until user is authorized, it is impossible to say which > bundle it should join. If there is need to handle several PPPoE services > with different names or several LAN segments, it theoretically may be > effective to have several MPD daemon instances running, one for each > service/segment. Generally I've spent less time on profiling and > optimizing MPD daemon itself then kernel code, so there still should be > a lot of space for improvement. Some possible optimization points I > still remember are: > - rework pevent() engine used by MPD state machine to use kqueue() > instead of poll() to reduce event overhead overhead; > - optimize locking of paction() functions used for thread creation and > completion for MPD-specific case; Idea was that by the cost of > functionality it could be simplified to reduce number of context switches; > - rewrite RADIUS auth/acct support to run within main mpd thread or > fixed number of external threads; since existing threaded approach was > implemented, libradius got support for asynchronous operation; that > should reduce overhead for thread creation/destruction; > - optimize ng_ksocket node when work with large number of hooks, using > some optimized search, and/or make MPD to create another sockets for > each next number of links to balance kernel and user-level search > overheads; initially MPD created separate set of sockets for every link, > but it was found too expensive for user-level FSM and was rewritten into > present state with almost minimal number of sockets and most > multiplexing tone in kernel. > > I have no personal production experience with PPPoE-L2TP LAC case. It is > used much less often and I had only several reports from people actively > using it and no much numbers. I think LAC case should have smaller > overhead and CPU load and so better scalability then usual traffic > termination: there is no IPCP layer in PPP to negotiate, there is no > interfaces to create and configure, no Netflow, no shapes, no periodic > accounting, etc. If you don't need to authenticate users, but only to > forward connections, and so server doesn't need to handle LCP protocol, > task simplifies even much more. > > If you can setup test environment to stress-test the LAC stuffs, it > would be interesting to see the numbers. On my test lab I used several > machines with mpd configured for thousands of PPPoE client sessions each > to generate simultaneous connections. For testing LAC you also should > have some fast enough L2TP terminator. If you have no such hardware for > test, you may try use several systems with mpd L2TP servers spreading > load between them in one of ways to avoid bottleneck there, while system > load in such case may potentially slightly differ. > > -- > Alexander Motin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- Sami Halabi Information Systems Engineer NMS Projects Expert
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAEW%2BogYfnjiQsA1rqJBRfmysnyfn-CLj3WsG68GWJ7et_wjigw>