Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Sep 2002 16:59:09 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-alpha@freebsd.org
Subject:   alpha performance on -current
Message-ID:  <15734.29725.515274.183629@grasshopper.cs.duke.edu>
In-Reply-To: <XFMail.20020904090455.jhb@FreeBSD.org>
References:  <XFMail.20020904090455.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

John Baldwin writes:
 > On the DS20 I have here, Peter's fix to basically enable the buffer cache
 > gave me about a 38% performance increase for a buildworld -j 2 on current:
 > 
 > Before:
 > --------------------------------------
 > build started at 15:11:08 on 08/28/02
 > build finished at 18:43:05 on 08/28/02
 > --------------------------------------
 > Which is a total time of 3:31:57
 > 
 > After:
 > --------------------------------------
 > build started at 22:41:31 on 09/03/02
 > build finished at 00:51:46 on 09/04/02
 > --------------------------------------
 > Which is a total time of 2:10:15

A buildworld from July:

     9611.42 real      6149.87 user      2613.20 sys

A buildworld today:

     8699.25 real      6985.64 user      1379.72 sys


For all I know, the speedup is just from disabling WITNESS and
INVARIANTS.  

Speaking of performance, I ran lmbench on my xp1000 under -stable and
-current.  I built the binary with compaq cc and ran the same binaries
on -current and -stable.  Both kernels were built without WITNESS,
INVARIANTS and DIAGNOSTIC.  On -current, I manually created an
/etc/malloc.conf symlink to remove malloc debugging.  I included
results from Tru64 5.1A for comparison.

The disk that -current is on is a little faster than the disks
-stable and  Tru64 are on, that's the only area where its not an
apples-to-apples test.

The first thing that stands out is that syscalls are *much* more
expensive on -current.  Nearly a factor of 4 for a null syscall
(0.57us -> 2.06 us).  

I suppose it equates to a latency of ~0.35us for each mutex
taken/released.  Can this be right?  The lmbench null syscall
is getppid:

# ./lat_syscall null
Simple syscall: 2.0178 microseconds
# ./lat_syscall null
Simple syscall: 2.0333 microseconds
# sysctl -w kern.giant.proc=0
kern.giant.proc: 1 -> 0
# ./lat_syscall null
Simple syscall: 1.6360 microseconds
# ./lat_syscall null
Simple syscall: 1.6333 microseconds


Is the locking overhead this bad on x86?  It looks downright
embarrassing on alpha.  Can anything be done about it?  Are the
memory barriers in atomic_cmpset_acq_* really needed?  They have the
look of belt & suspenders code..

FWIW, The appended diff to remove them reducess null system call
latency to 1.6us with kern.giant.proc=1, and 1.4us with
kern.giant.proc=0.  I'm about to start a buildworld with it, but I
don't have any SMP boxes.

On the other hand, the pipe results are tremendous.  Pipe now goes
like a bat out of hell.  Congrats to whoever did that.

Drew


                 L M B E N C H  2 . 0   S U M M A R Y
                 ------------------------------------


Basic system parameters
----------------------------------------------------
Host                 OS Description              Mhz
                                                    
--------- ------------- ----------------------- ----
monet     FreeBSD 4.5-S        alpha-freebsd4.5  497
monet     FreeBSD 5.0-C        alpha-freebsd5.0  497
monet         OSF1 V5.1     alphaev6-dec-osf5.1  499

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh  
                             call  I/O stat clos TCP   inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
monet     FreeBSD 4.5-S  497 0.57 3.78 16.2 22.5  33.7 1.29 6.25 1434 4692 12.K
monet     FreeBSD 5.0-C  497 2.06 7.52 28.1 44.1  18.7 2.35 8.91 1440 4716 8895
monet         OSF1 V5.1  499 0.41 1.33 144. 157.  19.8 1.01 4.56 1039 2745 7611

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
monet     FreeBSD 4.5-S 1.850   20.4   98.1   36.9  139.2    39.7   139.7
monet     FreeBSD 5.0-C 4.250   13.6   47.1   22.9   62.2    27.2    68.3
monet         OSF1 V5.1 5.430 9.8900   45.2   18.1   49.5    20.2    53.6

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
monet     FreeBSD 4.5-S 1.850  12.0 14.4  38.5  79.1  43.0 100.0 231.
monet     FreeBSD 5.0-C 4.250  28.4 33.5              87.5       300.
monet         OSF1 V5.1 5.430  24.8 48.7  90.3 152.8  84.8 197.4     

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page	
                        Create Delete Create Delete  Latency Fault   Fault 
--------- ------------- ------ ------ ------ ------  ------- -----   ----- 
monet     FreeBSD 4.5-S   80.4   57.7 2457.0 1243.8   2980.0 0.527        
monet     FreeBSD 5.0-C  120.2   99.4  330.3 4854.4   4688.0 0.145        
monet         OSF1 V5.1  488.3  855.4 1773.0 1083.4   1462.0 2.931  4470.0

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
monet     FreeBSD 4.5-S 82.3 201. 70.2  159.1  953.5  452.6  342.3 953. 351.2
monet     FreeBSD 5.0-C 429. 127. 102.  250.9  946.6  439.3  339.3 947. 350.7
monet         OSF1 V5.1 301. 237.       318.1  979.5  451.7  350.1 978. 356.7

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------  ---- ----- ------    --------    -------
monet     FreeBSD 4.5-S   497 6.016   30.1  196.6
monet     FreeBSD 5.0-C   497 6.039   30.3  197.7
monet         OSF1 V5.1   499 5.859   29.3  194.3


Index: atomic.h
===================================================================
RCS file: /home/ncvs/src/sys/alpha/include/atomic.h,v
retrieving revision 1.14
diff -u -r1.14 atomic.h
--- atomic.h	17 May 2002 05:45:39 -0000	1.14
+++ atomic.h	4 Sep 2002 20:37:43 -0000
@@ -419,14 +419,14 @@
 	int retval;
 
 	retval = atomic_cmpset_32(p, cmpval, newval);
-	alpha_mb();
+/*	alpha_mb();*/
 	return (retval);
 }
 
 static __inline u_int32_t
 atomic_cmpset_rel_32(volatile u_int32_t *p, u_int32_t cmpval, u_int32_t newval)
 {
-	alpha_mb();
+/*	alpha_mb();*/
 	return (atomic_cmpset_32(p, cmpval, newval));
 }
 
@@ -436,14 +436,14 @@
 	int retval;
 
 	retval = atomic_cmpset_64(p, cmpval, newval);
-	alpha_mb();
+/*	alpha_mb();*/
 	return (retval);
 }
 
 static __inline u_int64_t
 atomic_cmpset_rel_64(volatile u_int64_t *p, u_int64_t cmpval, u_int64_t newval)
 {
-	alpha_mb();
+/*	alpha_mb();*/
 	return (atomic_cmpset_64(p, cmpval, newval));
 }
 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15734.29725.515274.183629>