From owner-cvs-all Wed Nov 28 22:48:21 2001 Delivered-To: cvs-all@freebsd.org Received: from peter3.wemm.org (c1315225-a.plstn1.sfba.home.com [24.14.150.180]) by hub.freebsd.org (Postfix) with ESMTP id 38BE837B429; Wed, 28 Nov 2001 22:48:06 -0800 (PST) Received: from overcee.netplex.com.au (overcee.wemm.org [10.0.0.3]) by peter3.wemm.org (8.11.0/8.11.0) with ESMTP id fAT6m5M84083; Wed, 28 Nov 2001 22:48:06 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id C37793808; Wed, 28 Nov 2001 22:48:05 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Luigi Rizzo Cc: cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/pci if_sis.c In-Reply-To: <20011128141510.A13586@iguana.aciri.org> Date: Wed, 28 Nov 2001 22:48:05 -0800 From: Peter Wemm Message-Id: <20011129064805.C37793808@overcee.netplex.com.au> Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Luigi Rizzo wrote: > > While this helps things like packet forwarding, it hurts things like > > and generic servers (web, proxies) where things are done in > userland and the content is opaque and the only unaligned > accesses are for the IP/TCP headers (but those are touched > already in the packet forwarding case). > > > NFS which now have to do lots and lots of unaligned accesses. > > I would actually like to see some numbers showing that this is the > case. Where else these unaligned accesses could be other than in > creating the NFS/RPC headers ? Do a bunch of unaligned accesses > really cost more than a memory-to-memory copy of 1500 bytes ? Even just the IP, TCP and UDP header processing is affected. > > Have you benchmarked anything else besides packet forwarding? > > no, how would you benchmark this (that is without hitting a > bottleneck elsewhere in the system) ? You dont need to hit the wall, supply a constant stream of requests and measure the cpu used in interrupt or system mode. To show that unaligned accesses do have a measurable effect: char buf[100000*4]; main() { int i; int j; int n; int *p; j = 0; for (n = 0; n < 10000; n++) { p = (int *)&buf[OFF]; for (i = 0; i < 99999; i++) j += *p++; } exit(j); } On an AthlonMP (smp kernel, smp is running, my X11 desktop) peter@daintree[10:19pm]~-192> cc -O2 -DOFF=0 -o b0 b.c peter@daintree[10:19pm]~-193> cc -O2 -DOFF=1 -o b1 b.c peter@daintree[10:19pm]~-194> cc -O2 -DOFF=2 -o b2 b.c peter@daintree[10:19pm]~-195> cc -O2 -DOFF=3 -o b3 b.c peter@daintree[10:19pm]~-196> set time peter@daintree[10:20pm]~-198> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3 8.876u 0.023s 0:08.97 99.1% 5+671k 0+0io 0pf+0w 9.154u 0.007s 0:09.23 99.1% 5+671k 0+0io 0pf+0w 8.901u 0.000s 0:08.97 99.2% 5+671k 0+0io 0pf+0w 9.157u 0.007s 0:09.23 99.1% 5+671k 0+0io 0pf+0w 8.883u 0.015s 0:08.96 99.2% 5+670k 0+0io 0pf+0w 9.147u 0.015s 0:09.22 99.2% 5+671k 0+0io 0pf+0w On a Pentium4: peter@pentium4[10:25pm]/home/tmp-11> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3 3.229u 0.000s 0:03.23 100.0% 5+673k 0+0io 0pf+0w 4.464u 0.000s 0:04.46 100.0% 5+672k 0+0io 0pf+0w 3.236u 0.000s 0:03.23 100.0% 5+671k 0+0io 0pf+0w 4.464u 0.000s 0:04.46 100.0% 5+672k 0+0io 0pf+0w 3.235u 0.000s 0:03.23 100.0% 5+671k 0+0io 0pf+0w 4.464u 0.000s 0:04.46 100.0% 5+670k 0+0io 0pf+0w On a Pentium3 (coppermine): > ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3 14.710u 0.000s 0:14.71 100.0% 5+671k 0+0io 0pf+0w 14.728u 0.000s 0:14.73 99.9% 5+671k 0+0io 0pf+0w 14.718u 0.000s 0:14.71 100.0% 5+671k 0+0io 0pf+0w 14.720u 0.007s 0:14.73 99.9% 5+671k 0+0io 0pf+0w 14.718u 0.000s 0:14.71 100.0% 5+671k 0+0io 0pf+0w 14.735u 0.000s 0:14.73 100.0% 5+670k 0+0io 0pf+0w On a Pentuim Pro (200MHz, I reduced the outer loop from 10000 to 1000): > ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3 3.624u 0.007s 0:03.65 99.1% 5+677k 0+0io 0pf+0w 3.673u 0.007s 0:03.68 99.7% 5+673k 0+0io 0pf+0w 3.623u 0.015s 0:03.65 99.4% 5+674k 0+0io 0pf+0w 3.663u 0.007s 0:03.68 99.4% 5+671k 0+0io 0pf+0w 3.639u 0.007s 0:03.65 99.4% 5+674k 0+0io 0pf+0w 3.684u 0.000s 0:03.69 99.7% 5+673k 0+0io 0pf+0w The most spectacular sufferer of unaligned accesses is the Pentium-4 which takes ~38% longer to do unaligned accesses... I suspect writes are going to be more prounced, especially on systems with ECC that have to do read/merge/write for every unaligned write. > > > Right now the new behaviour is controlled by a sysctl variable, > > > hw.sis_quick which defaults to 1 (on), you can set it to 0 to > ... > > > > Please do not remove this yet. > > no problem. It will actually be useful to tell people who have > a reasonable testbed to toggle this and see if it makes a difference. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message