From owner-freebsd-arm@FreeBSD.ORG Tue Sep 9 15:16:35 2008 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B3171065684 for ; Tue, 9 Sep 2008 15:16:35 +0000 (UTC) (envelope-from jacques.fourie@gmail.com) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.232]) by mx1.freebsd.org (Postfix) with ESMTP id AF8E48FC12 for ; Tue, 9 Sep 2008 15:16:34 +0000 (UTC) (envelope-from jacques.fourie@gmail.com) Received: by wx-out-0506.google.com with SMTP id s17so570885wxc.7 for ; Tue, 09 Sep 2008 08:16:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=y+R8/O9Ym4QRIxETb25r6S0tGtcxesQVDMt9s3Tkm2Q=; b=MhMiPEHTFT4CPshg804zWcgRIK3xdUyuvVmRk7jN6redlatgJlnD+wUCQdksLxCPVw TU/zLC3FEKWi2lei0LJYXyRSj9UGSH+2iwzfUrksv/pgNP8AF3azRBlxOpTSrvkSJrWl 8rJZVXJJ0n15L5Hh7gyPvi7+f7dOQzEImSr4M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=hDsBtCscmBMP3jicstqynsH97WNuwNs5IA7otCuXvIqxGaatBxZoVbBGm7bTS69fm8 P/E7gIoVlzdlfCXZzaTTHSpyOfRKiY9utOEADJmOtBrbVe/Md/FEeWDspqoeUTFaM0rF U0SysfbhmsXWymt7lgSEMfEbpBe5+D7F9rl/U= Received: by 10.103.192.10 with SMTP id u10mr11343612mup.29.1220973391711; Tue, 09 Sep 2008 08:16:31 -0700 (PDT) Received: by 10.103.17.17 with HTTP; Tue, 9 Sep 2008 08:16:31 -0700 (PDT) Message-ID: Date: Tue, 9 Sep 2008 17:16:31 +0200 From: "Jacques Fourie" To: "Sam Leffler" In-Reply-To: <48C6900C.8070708@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080909175556.07bac5f0.stas@FreeBSD.org> <48C6900C.8070708@freebsd.org> Cc: freebsd-arm@freebsd.org Subject: Re: Routing benchmarks X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Sep 2008 15:16:35 -0000 On Tue, Sep 9, 2008 at 5:02 PM, Sam Leffler wrote: > Jacques Fourie wrote: >> >> On Tue, Sep 9, 2008 at 3:55 PM, Stanislav Sedov wrote: >> >>> >>> On Tue, 9 Sep 2008 15:33:30 +0200 >>> "Jacques Fourie" mentioned: >>> >>> >>>> >>>> Hi, >>>> >>>> I've performed some benchmark tests on my Gumstix Connex 400 (Intel >>>> Xscale PXA 255 CPU clocked at 400MHz) with a netDuo expansion board. >>>> This board has two smc network interfaces. I configure the gumstix as >>>> a router and measure network throughput with netperf running on >>>> seperate boxes on either side of the gumstix. My initial tests showed >>>> a TCP throughput of 2Mbit/s. After adapting the smc driver to use DMA >>>> this figure went up to 7Mbit/s. Although this is a significant >>>> improvement, it still seems to be a bit slow. Does anyone have any >>>> tips on how I can go about to try and figure out where the bottleneck >>>> lies? Initial profiling showed that a significant amount of time was >>>> spent doing memory to memory copies of data, but after the DMA change >>>> profiling does not show any obvious culprits. >>>> >>>> >>> >>> Have you tried checking the speed of the interface itself? Without >>> routing involved? May it be the interfaces itself being so slow? >>> >>> -- >>> Stanislav Sedov >>> ST4096-RIPE >>> >>> >> >> Running netserver on the gumstix shows a throughput of 2.4Mbit/s. At >> the moment I can't get if_bridge to work - will try to figure out what >> is going on. A bridging benchmark may be more informative. >> > > You said you did profiling but you didn't provide the data to inspect. It's > possible kernel profiling has never been tried on your platform; did you > sanity check the results? (e.g. run a known test load and check results; > verify all routines that should execute appear in the profile). Also if > copy overhead shows up as significant look to see why those copies are being > done; it's often possible to avoid a copy. > > My experience in working with architectures like this is that cache handling > can be a significant cost that doesn't always show up on a profile. > > Also you may find useful information by tracking mbufs using the h/w clock > at important places along the "fast path" then look at whether the overhead > for each step is reasonable. I did this for bridged traffic by forcing the > rx dma to go to an mbuf+cluster then used the free storage in the mbuf > header to store timestamps. At the end of the processing path I sorted the > data into buckets by the sample points and added a sysctl to dump the > histogram to see min/max/avg. > > Sam > > Thanks for the nice idea - will try something similar. At the moment I'm also suspecting that cache handling has got a lot to do with the performance figures that I'm seeing. The PXA255 has a 32KB data and 32KB instruction cache. Jacques