From owner-freebsd-net@FreeBSD.ORG Fri Feb 4 21:43:39 2005 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5D7F716A4CE; Fri, 4 Feb 2005 21:43:39 +0000 (GMT) Received: from magellan.palisadesys.com (magellan.palisadesys.com [192.188.162.211]) by mx1.FreeBSD.org (Postfix) with ESMTP id BA9A443D49; Fri, 4 Feb 2005 21:43:38 +0000 (GMT) (envelope-from ghelmer@palisadesys.com) Received: from [192.168.0.101] (63-227-65-16.desm.qwest.net [63.227.65.16]) (authenticated bits=0)j14LhaaA018216 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 4 Feb 2005 15:43:36 -0600 (CST) (envelope-from ghelmer@palisadesys.com) Message-ID: <4203EC87.3070504@palisadesys.com> Date: Fri, 04 Feb 2005 15:43:35 -0600 From: Guy Helmer User-Agent: Mozilla Thunderbird 1.0RC1 (Windows/20041201) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Ruslan Ermilov References: <4203AAE3.4090906@palisadesys.com> <20050204204804.GC71363@ip.net.ua> In-Reply-To: <20050204204804.GC71363@ip.net.ua> X-Palisade-MailScanner-Information: Please contact the ISP for more information X-Palisade-MailScanner: Found to be clean X-MailScanner-From: ghelmer@palisadesys.com Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.1 cc: freebsd-net@FreeBSD.org Subject: Re: Netgraph performance question X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Feb 2005 21:43:39 -0000 Ruslan Ermilov wrote: >Hi Guy, > >On Fri, Feb 04, 2005 at 11:03:31AM -0600, Guy Helmer wrote: > > >>A while back, Maxim Konovalov made a commit to usr.sbin/ngctl/main.c to >>increase its socket receive buffer size to help 'ngctl list' deal with a >>big number of nodes, and Ruslan Ermilov responded that setting sysctls >>net.graph.recvspace=200000 and net.graph.maxdgram=200000 was a good idea >>on a system with a large number of nodes. >> >>I'm getting what I consider to be sub-par performance under FreeBSD 5.3 >>from a userland program using ngsockets connected into ng_tee to play >>with packets that are traversing a ng_bridge, and I finally have an >>opportunity to look into this. I say "sub-par" because when we've >>tested this configuration using three 2.8GHz Xeon machines with Gigabit >>Ethernet interfaces at 1000Mbps full-duplex, we obtained peak >>performance of a single TCP stream of about 12MB/sec through the >>bridging machine as measured by NetPIPE and netperf. >> >The bottleneck must be in ng_tee(4) -- the latter uses m_dup(9) when >a duplicate is needed, which is very expensive as it has to create a >writable copy of the entire mbuf chain (the original chain is DMA'ed >into the host memory by the network card). > > I'm sorry, I mis-wrote. My ng_tee is actually modified to only passes packets to the r2l/l2r hooks if they are connected, otherwise packets are passed directly to the left/right hooks (so it's an optional divert), so there is no m_dup anymore in my modified ng_tee. >>I'm wondering if bumping the recvspace should help, if changing the >>ngsocket hook to queue incoming data should help, if it would be best to >>replace ngsocket with a memory-mapped interface, or if anyone has any >>other ideas that would help performance. >> >If you absolutely need to see *all* GigE traffic in userland, then >it's going to be troublesome. If not, filter it with ng_bpf(4). > > Thanks, Ruslan. Yes, I do need to pass all the traffic down through my userland daemon. Since I'm just beginning to work with Netgraph, I was wondering if there was something simple or obvious that I was missing, or if there was a known performance issue with one of the nodes I'm using (as you pointed out with ng_tee). I assumed that the bridging and trip through userland would only add latency to the connection, but the result of the performance test seemed to indicate that there is either a bottleneck I need to solve or my testing methodology was flawed. Thanks again, Guy