From owner-freebsd-net@FreeBSD.ORG Mon Nov 14 17:16:59 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 33743106564A; Mon, 14 Nov 2011 17:16:59 +0000 (UTC) (envelope-from sodynet1@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id E412E8FC14; Mon, 14 Nov 2011 17:16:58 +0000 (UTC) Received: by iakl21 with SMTP id l21so8796747iak.13 for ; Mon, 14 Nov 2011 09:16:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Dv7M+CqQxKvN1c31QDjV0ko7A1kqeazyooGeF7oXdvY=; b=cLmqVJ6qliikgY5/at8JCrlH49Tjj3S387iwh2H5nYOQSKK4pfoX4plbijz4s5lcp7 z72qzG9mqihiwkk1Jkp64u5LtSAaEsejV+/2PVxjhjRvNOor1HhuBgWL65AoHjBcKaX/ ZqWDm/NN3IS2udIcDl/Es8uZ5ihsWti9MFlU8= MIME-Version: 1.0 Received: by 10.231.9.40 with SMTP id j40mr5489481ibj.59.1321291018460; Mon, 14 Nov 2011 09:16:58 -0800 (PST) Received: by 10.231.32.12 with HTTP; Mon, 14 Nov 2011 09:16:58 -0800 (PST) In-Reply-To: <4EBDA4B2.6030602@FreeBSD.org> References: <4EBDA4B2.6030602@FreeBSD.org> Date: Mon, 14 Nov 2011 19:16:58 +0200 Message-ID: From: Sami Halabi To: Alexander Motin Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net , David Hooton Subject: Re: MPD LAC Scaling X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Nov 2011 17:16:59 -0000 Hi, i wonder why not putting all your suggestions on MPD instead of making hacks, as I see it its necessary for performance, after all MPD is aimed to FBSD only, so its elementary to use ke Sami 2011/11/12 Alexander Motin > Hi. > > > I'm currently evaluating MPD as a potential LAC solution for a > > project I'm working on. I'm looking to try and handle at least 4Gbit > > and 20,000 sessions worth of PPPoE -> L2TP LAC traffic per server. > > The reading I've done from the archives so far seems to indicate that > > this has not yet been done. > > I also haven't heard about so big cases, but also can't say it is > theoretically impossible after some tuning and development work. At this > point I have neither production/test environment nor much time to > actively work on it, but I want to express some experience and ideas in > case somebody wants to take that. > > First, as Julian said, it is not necessary should be one server to > handle all load. Cluster of smaller machines should be preferable from > many points. PPPoE allows you to have several servers and load-balance > them. At this moment MPD can't balance load dynamically, but you can do > it manually, limiting number of sessions per one server. > > As some example point of hardware from personal experience I can say > that three years ago mpd5 on 1U servers with single Core2Duo CPUs, 1GB > of RAM and two 1Gb NICs (less then $1K that time) handled in production > about 2000 PPPoE sessions and 600Mbps of traffic per server, including > Netflow generation, per-customer typed traffic shaping and accounting. > Modern and more powerful hardware is able to do more. > > Getting higher numbers there can mostly be split in two questions: > getting more traffic and getting more sessions, as limitations are > different. > - Getting more traffic mostly means scaling kernel Netgraph and > networking code to more CPU cores. As soon as Netgraph uses direct > function calls when possible, it depends on number of network interrupt > threads in system. Three years ago there was only one net SWI thread and > setting net.isr.direct=1 while having several NICs in system allowed to > distribute load between CPUs. Modern high-level NICs with several MSI-X > interrupts should give the same effect. Now it is also possible to have > several net SWI threads, but I haven't tested it. > - Getting more sessions also means tuning and optimizing user-level mpd > daemon. Three years ago on Pentium4-level test machine I've reached > about 5K PPPoE sessions with RADIUS auth/acct. Main limiting factor was > user-level daemon performance. The more sessions connected, the more > overhead daemon had in face of LCP echo requests and event timeouts to > handle, number of netgraph kernel sockets to listen, etc. At some point > daemon is just unable to handle all new incoming events in time and > resending requests by clients causes cumulative effect. So the main > limiting factor is not just number of users, but also number of events. > If users connect one by one, number of sessions can be quite high. But > if due to some accident you have all users dropped and reconnecting, > that may cause overload sooner. In that case it is important even what > LCP echo timeout set on the server and clients, or how many logs are > enabled. My best tuning result that time on Pentium4-level machine was > about 100 connections per second. It allowed to setup 5000 simultaneous > sessions within 50 seconds. Higher numbers were problematic. At this > moment user-level MPD's main state machine is single-threaded, except > authorization and accounting (like RADIUS), that are done in separate > threads, but require synchronized completion to return the data. > Splitting main FPM on several threads is difficult, because it would > require to somehow to group links and bundles within different threads > with different locks, that is difficult, because of multilink support > and because until user is authorized, it is impossible to say which > bundle it should join. If there is need to handle several PPPoE services > with different names or several LAN segments, it theoretically may be > effective to have several MPD daemon instances running, one for each > service/segment. Generally I've spent less time on profiling and > optimizing MPD daemon itself then kernel code, so there still should be > a lot of space for improvement. Some possible optimization points I > still remember are: > - rework pevent() engine used by MPD state machine to use kqueue() > instead of poll() to reduce event overhead overhead; > - optimize locking of paction() functions used for thread creation and > completion for MPD-specific case; Idea was that by the cost of > functionality it could be simplified to reduce number of context switches; > - rewrite RADIUS auth/acct support to run within main mpd thread or > fixed number of external threads; since existing threaded approach was > implemented, libradius got support for asynchronous operation; that > should reduce overhead for thread creation/destruction; > - optimize ng_ksocket node when work with large number of hooks, using > some optimized search, and/or make MPD to create another sockets for > each next number of links to balance kernel and user-level search > overheads; initially MPD created separate set of sockets for every link, > but it was found too expensive for user-level FSM and was rewritten into > present state with almost minimal number of sockets and most > multiplexing tone in kernel. > > I have no personal production experience with PPPoE-L2TP LAC case. It is > used much less often and I had only several reports from people actively > using it and no much numbers. I think LAC case should have smaller > overhead and CPU load and so better scalability then usual traffic > termination: there is no IPCP layer in PPP to negotiate, there is no > interfaces to create and configure, no Netflow, no shapes, no periodic > accounting, etc. If you don't need to authenticate users, but only to > forward connections, and so server doesn't need to handle LCP protocol, > task simplifies even much more. > > If you can setup test environment to stress-test the LAC stuffs, it > would be interesting to see the numbers. On my test lab I used several > machines with mpd configured for thousands of PPPoE client sessions each > to generate simultaneous connections. For testing LAC you also should > have some fast enough L2TP terminator. If you have no such hardware for > test, you may try use several systems with mpd L2TP servers spreading > load between them in one of ways to avoid bottleneck there, while system > load in such case may potentially slightly differ. > > -- > Alexander Motin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- Sami Halabi Information Systems Engineer NMS Projects Expert