Date: Wed, 4 Sep 2002 16:59:09 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-alpha@freebsd.org Subject: alpha performance on -current Message-ID: <15734.29725.515274.183629@grasshopper.cs.duke.edu> In-Reply-To: <XFMail.20020904090455.jhb@FreeBSD.org> References: <XFMail.20020904090455.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin writes:
> On the DS20 I have here, Peter's fix to basically enable the buffer cache
> gave me about a 38% performance increase for a buildworld -j 2 on current:
>
> Before:
> --------------------------------------
> build started at 15:11:08 on 08/28/02
> build finished at 18:43:05 on 08/28/02
> --------------------------------------
> Which is a total time of 3:31:57
>
> After:
> --------------------------------------
> build started at 22:41:31 on 09/03/02
> build finished at 00:51:46 on 09/04/02
> --------------------------------------
> Which is a total time of 2:10:15
A buildworld from July:
9611.42 real 6149.87 user 2613.20 sys
A buildworld today:
8699.25 real 6985.64 user 1379.72 sys
For all I know, the speedup is just from disabling WITNESS and
INVARIANTS.
Speaking of performance, I ran lmbench on my xp1000 under -stable and
-current. I built the binary with compaq cc and ran the same binaries
on -current and -stable. Both kernels were built without WITNESS,
INVARIANTS and DIAGNOSTIC. On -current, I manually created an
/etc/malloc.conf symlink to remove malloc debugging. I included
results from Tru64 5.1A for comparison.
The disk that -current is on is a little faster than the disks
-stable and Tru64 are on, that's the only area where its not an
apples-to-apples test.
The first thing that stands out is that syscalls are *much* more
expensive on -current. Nearly a factor of 4 for a null syscall
(0.57us -> 2.06 us).
I suppose it equates to a latency of ~0.35us for each mutex
taken/released. Can this be right? The lmbench null syscall
is getppid:
# ./lat_syscall null
Simple syscall: 2.0178 microseconds
# ./lat_syscall null
Simple syscall: 2.0333 microseconds
# sysctl -w kern.giant.proc=0
kern.giant.proc: 1 -> 0
# ./lat_syscall null
Simple syscall: 1.6360 microseconds
# ./lat_syscall null
Simple syscall: 1.6333 microseconds
Is the locking overhead this bad on x86? It looks downright
embarrassing on alpha. Can anything be done about it? Are the
memory barriers in atomic_cmpset_acq_* really needed? They have the
look of belt & suspenders code..
FWIW, The appended diff to remove them reducess null system call
latency to 1.6us with kern.giant.proc=1, and 1.4us with
kern.giant.proc=0. I'm about to start a buildworld with it, but I
don't have any SMP boxes.
On the other hand, the pipe results are tremendous. Pipe now goes
like a bat out of hell. Congrats to whoever did that.
Drew
L M B E N C H 2 . 0 S U M M A R Y
------------------------------------
Basic system parameters
----------------------------------------------------
Host OS Description Mhz
--------- ------------- ----------------------- ----
monet FreeBSD 4.5-S alpha-freebsd4.5 497
monet FreeBSD 5.0-C alpha-freebsd5.0 497
monet OSF1 V5.1 alphaev6-dec-osf5.1 499
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
monet FreeBSD 4.5-S 497 0.57 3.78 16.2 22.5 33.7 1.29 6.25 1434 4692 12.K
monet FreeBSD 5.0-C 497 2.06 7.52 28.1 44.1 18.7 2.35 8.91 1440 4716 8895
monet OSF1 V5.1 499 0.41 1.33 144. 157. 19.8 1.01 4.56 1039 2745 7611
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
monet FreeBSD 4.5-S 1.850 20.4 98.1 36.9 139.2 39.7 139.7
monet FreeBSD 5.0-C 4.250 13.6 47.1 22.9 62.2 27.2 68.3
monet OSF1 V5.1 5.430 9.8900 45.2 18.1 49.5 20.2 53.6
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
monet FreeBSD 4.5-S 1.850 12.0 14.4 38.5 79.1 43.0 100.0 231.
monet FreeBSD 5.0-C 4.250 28.4 33.5 87.5 300.
monet OSF1 V5.1 5.430 24.8 48.7 90.3 152.8 84.8 197.4
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
--------- ------------- ------ ------ ------ ------ ------- ----- -----
monet FreeBSD 4.5-S 80.4 57.7 2457.0 1243.8 2980.0 0.527
monet FreeBSD 5.0-C 120.2 99.4 330.3 4854.4 4688.0 0.145
monet OSF1 V5.1 488.3 855.4 1773.0 1083.4 1462.0 2.931 4470.0
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
monet FreeBSD 4.5-S 82.3 201. 70.2 159.1 953.5 452.6 342.3 953. 351.2
monet FreeBSD 5.0-C 429. 127. 102. 250.9 946.6 439.3 339.3 947. 350.7
monet OSF1 V5.1 301. 237. 318.1 979.5 451.7 350.1 978. 356.7
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
---------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Guesses
--------- ------------- ---- ----- ------ -------- -------
monet FreeBSD 4.5-S 497 6.016 30.1 196.6
monet FreeBSD 5.0-C 497 6.039 30.3 197.7
monet OSF1 V5.1 499 5.859 29.3 194.3
Index: atomic.h
===================================================================
RCS file: /home/ncvs/src/sys/alpha/include/atomic.h,v
retrieving revision 1.14
diff -u -r1.14 atomic.h
--- atomic.h 17 May 2002 05:45:39 -0000 1.14
+++ atomic.h 4 Sep 2002 20:37:43 -0000
@@ -419,14 +419,14 @@
int retval;
retval = atomic_cmpset_32(p, cmpval, newval);
- alpha_mb();
+/* alpha_mb();*/
return (retval);
}
static __inline u_int32_t
atomic_cmpset_rel_32(volatile u_int32_t *p, u_int32_t cmpval, u_int32_t newval)
{
- alpha_mb();
+/* alpha_mb();*/
return (atomic_cmpset_32(p, cmpval, newval));
}
@@ -436,14 +436,14 @@
int retval;
retval = atomic_cmpset_64(p, cmpval, newval);
- alpha_mb();
+/* alpha_mb();*/
return (retval);
}
static __inline u_int64_t
atomic_cmpset_rel_64(volatile u_int64_t *p, u_int64_t cmpval, u_int64_t newval)
{
- alpha_mb();
+/* alpha_mb();*/
return (atomic_cmpset_64(p, cmpval, newval));
}
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15734.29725.515274.183629>
