From owner-freebsd-arch@FreeBSD.ORG Tue Nov 23 20:00:01 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B522916A4CE for ; Tue, 23 Nov 2004 20:00:01 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id EE31A43D45 for ; Tue, 23 Nov 2004 20:00:00 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 7238 invoked by uid 89); 23 Nov 2004 19:59:59 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 23 Nov 2004 19:59:59 -0000 Received: (qmail 28983 invoked by uid 89); 23 Nov 2004 19:56:21 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 23 Nov 2004 19:56:21 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id iANJuK5R047172; Tue, 23 Nov 2004 14:56:20 -0500 (EST) (envelope-from ups@tree.com) From: Stephan Uphoff To: Robert Watson In-Reply-To: References: Content-Type: text/plain Message-Id: <1101239780.26313.10158.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Tue, 23 Nov 2004 14:56:20 -0500 Content-Transfer-Encoding: 7bit cc: "freebsd-arch@freebsd.org" Subject: Re: macro benchmark for mutex locks needed. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Nov 2004 20:00:01 -0000 On Tue, 2004-11-23 at 11:49, Robert Watson wrote: > On Tue, 23 Nov 2004, Stephan Uphoff wrote: > > > I have a bunch of ideas to speed up spin and mutex locks somewhat. For > > this I need benchmarks to test different modifications. > > > > While the micro-benchmark from rwatson@ is a good way to quickly test > > modifications to weed out unlikely candidates - jhb@ tests have shown > > that micro and macro-benchmarks do not always show the same result. > > > > Running benchmarks and booting takes a lot of time. Since this is NOT > > one my favorite tasks I want to run generally accepted benchmarks so I > > can test (boot) each modification exactly once for each test machine. > > > > If you think I should run certain benchmarks with certain parameters > > please tell me BEFORE I start testing! > > I like to use netblast from src/tools/tools/netrate/netblast. It attempts > to send packets as quickly as possible on a network interface, which is a > CPU-intensive operation that is very sensitive to the cost of > synchronization. On an SMP system, it also generates a moderate ithread > load as the gig-e interface transmits, and that ithread will often contend > on the network interface driver lock with the running netblast thread. As > such, it changes that affect the cost and handling of contention are also > visible in this benchmark. With the synchronization micro-benchmark, I > see spin locks on SMP being faster with the atomic release removed, but in > the netblast test, I see those spinlocks as slower on SMP, since they > behave less well under contention. > > (The above with 64-bit if_em cards on a dual-Xeon). Note that you'll want > to make sure netreceive is running on a second box, or that you're sending > to the broadcast address, or the icmp errors will substantially quench > your send ability due to the asynchronouse report of the port closed. Mhh... My initial SMP test machine will be a Dell 1600SC dual-Xeon (P4 - 2.8 GHz/400MHz bus). It has a build in em Ethernet interface. Unfortunately it is only a 82540EM / 32bit chip and it shares the PCI bus with a few 33MHz PCI cards :-(. The machine has an unused pci bus with free PCI-X slot but I would need to order a server card. What is you normal data rate with this test - any chance that the 82540EM will be sufficient? The data sink will be a 32bit em card with an ancient slow P4 processor using a cross-over cable. Since this combination is probably not able to sink enough data I plan to add a dummy static arp address for a dummy remote IP address to the SMP machine. This should keep the the data sink's em card from actually filling the receive buffers. Since this takes the pci bus and the slow processor out of the equation this should be a perfect data sink - right?