From owner-cvs-all  Wed Nov 28 22:48:21 2001
Delivered-To: cvs-all@freebsd.org
Received: from peter3.wemm.org (c1315225-a.plstn1.sfba.home.com [24.14.150.180])
	by hub.freebsd.org (Postfix) with ESMTP
	id 38BE837B429; Wed, 28 Nov 2001 22:48:06 -0800 (PST)
Received: from overcee.netplex.com.au (overcee.wemm.org [10.0.0.3])
	by peter3.wemm.org (8.11.0/8.11.0) with ESMTP id fAT6m5M84083;
	Wed, 28 Nov 2001 22:48:06 -0800 (PST)
	(envelope-from peter@wemm.org)
Received: from wemm.org (localhost [127.0.0.1])
	by overcee.netplex.com.au (Postfix) with ESMTP
	id C37793808; Wed, 28 Nov 2001 22:48:05 -0800 (PST)
	(envelope-from peter@wemm.org)
X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4
To: Luigi Rizzo <luigi@FreeBSD.org>
Cc: cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject: Re: cvs commit: src/sys/pci if_sis.c 
In-Reply-To: <20011128141510.A13586@iguana.aciri.org> 
Date: Wed, 28 Nov 2001 22:48:05 -0800
From: Peter Wemm <peter@wemm.org>
Message-Id: <20011129064805.C37793808@overcee.netplex.com.au>
Sender: owner-cvs-all@FreeBSD.ORG
Precedence: bulk
List-ID: <cvs-all.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20cvs-all>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20cvs-all>
X-Loop: FreeBSD.ORG

Luigi Rizzo wrote:
> > While this helps things like packet forwarding, it hurts things like
> 
> and generic servers (web, proxies) where things are done in
> userland and the content is opaque and the only unaligned
> accesses are for the IP/TCP headers (but those are touched
> already in the packet forwarding case).
> 
> > NFS which now have to do lots and lots of unaligned accesses.
> 
> I would actually like to see some numbers showing that this is the
> case.  Where else these unaligned accesses could be other than in
> creating the NFS/RPC headers ? Do a bunch of unaligned accesses
> really cost more than a memory-to-memory copy of 1500 bytes ?

Even just the IP, TCP and UDP header processing is affected.

> > Have you benchmarked anything else besides packet forwarding?
> 
> no, how would you benchmark this (that is without hitting a
> bottleneck elsewhere in the system) ?

You dont need to hit the wall, supply a constant stream of requests and
measure the cpu used in interrupt or system mode.

To show that unaligned accesses do have a measurable effect:
char buf[100000*4];

main()
{
        int i;
        int j;
        int n;
        int *p;

        j = 0;
        for (n = 0; n < 10000; n++) {
                p = (int *)&buf[OFF];
                for (i = 0; i < 99999; i++)
                        j += *p++;
        }
        exit(j);
}

On an AthlonMP (smp kernel, smp is running, my X11 desktop)
peter@daintree[10:19pm]~-192> cc -O2 -DOFF=0 -o b0 b.c
peter@daintree[10:19pm]~-193> cc -O2 -DOFF=1 -o b1 b.c
peter@daintree[10:19pm]~-194> cc -O2 -DOFF=2 -o b2 b.c
peter@daintree[10:19pm]~-195> cc -O2 -DOFF=3 -o b3 b.c
peter@daintree[10:19pm]~-196> set time
peter@daintree[10:20pm]~-198> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3
8.876u 0.023s 0:08.97 99.1%     5+671k 0+0io 0pf+0w
9.154u 0.007s 0:09.23 99.1%     5+671k 0+0io 0pf+0w
8.901u 0.000s 0:08.97 99.2%     5+671k 0+0io 0pf+0w
9.157u 0.007s 0:09.23 99.1%     5+671k 0+0io 0pf+0w
8.883u 0.015s 0:08.96 99.2%     5+670k 0+0io 0pf+0w
9.147u 0.015s 0:09.22 99.2%     5+671k 0+0io 0pf+0w

On a Pentium4:
peter@pentium4[10:25pm]/home/tmp-11> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3
3.229u 0.000s 0:03.23 100.0%    5+673k 0+0io 0pf+0w
4.464u 0.000s 0:04.46 100.0%    5+672k 0+0io 0pf+0w
3.236u 0.000s 0:03.23 100.0%    5+671k 0+0io 0pf+0w
4.464u 0.000s 0:04.46 100.0%    5+672k 0+0io 0pf+0w
3.235u 0.000s 0:03.23 100.0%    5+671k 0+0io 0pf+0w
4.464u 0.000s 0:04.46 100.0%    5+670k 0+0io 0pf+0w

On a Pentium3 (coppermine):
> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3
14.710u 0.000s 0:14.71 100.0%   5+671k 0+0io 0pf+0w
14.728u 0.000s 0:14.73 99.9%    5+671k 0+0io 0pf+0w
14.718u 0.000s 0:14.71 100.0%   5+671k 0+0io 0pf+0w
14.720u 0.007s 0:14.73 99.9%    5+671k 0+0io 0pf+0w
14.718u 0.000s 0:14.71 100.0%   5+671k 0+0io 0pf+0w
14.735u 0.000s 0:14.73 100.0%   5+670k 0+0io 0pf+0w

On a Pentuim Pro (200MHz, I reduced the outer loop from 10000 to 1000):
> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3
3.624u 0.007s 0:03.65 99.1%     5+677k 0+0io 0pf+0w
3.673u 0.007s 0:03.68 99.7%     5+673k 0+0io 0pf+0w
3.623u 0.015s 0:03.65 99.4%     5+674k 0+0io 0pf+0w
3.663u 0.007s 0:03.68 99.4%     5+671k 0+0io 0pf+0w
3.639u 0.007s 0:03.65 99.4%     5+674k 0+0io 0pf+0w
3.684u 0.000s 0:03.69 99.7%     5+673k 0+0io 0pf+0w

The most spectacular sufferer of unaligned accesses is the Pentium-4 which
takes ~38% longer to do unaligned accesses...  I suspect writes are
going to be more prounced, especially on systems with ECC that have to
do read/merge/write for every unaligned write.

> > >   Right now the new behaviour is controlled by a sysctl variable,
> > >   hw.sis_quick which defaults to 1 (on), you can set it to 0 to
> ...
> > 
> > Please do not remove this yet.
> 
> no problem. It will actually be useful to tell people who have
> a reasonable testbed to toggle this and see if it makes a difference.

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message