Date: Wed, 4 Sep 2002 16:59:09 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-alpha@freebsd.org Subject: alpha performance on -current Message-ID: <15734.29725.515274.183629@grasshopper.cs.duke.edu> In-Reply-To: <XFMail.20020904090455.jhb@FreeBSD.org> References: <XFMail.20020904090455.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin writes: > On the DS20 I have here, Peter's fix to basically enable the buffer cache > gave me about a 38% performance increase for a buildworld -j 2 on current: > > Before: > -------------------------------------- > build started at 15:11:08 on 08/28/02 > build finished at 18:43:05 on 08/28/02 > -------------------------------------- > Which is a total time of 3:31:57 > > After: > -------------------------------------- > build started at 22:41:31 on 09/03/02 > build finished at 00:51:46 on 09/04/02 > -------------------------------------- > Which is a total time of 2:10:15 A buildworld from July: 9611.42 real 6149.87 user 2613.20 sys A buildworld today: 8699.25 real 6985.64 user 1379.72 sys For all I know, the speedup is just from disabling WITNESS and INVARIANTS. Speaking of performance, I ran lmbench on my xp1000 under -stable and -current. I built the binary with compaq cc and ran the same binaries on -current and -stable. Both kernels were built without WITNESS, INVARIANTS and DIAGNOSTIC. On -current, I manually created an /etc/malloc.conf symlink to remove malloc debugging. I included results from Tru64 5.1A for comparison. The disk that -current is on is a little faster than the disks -stable and Tru64 are on, that's the only area where its not an apples-to-apples test. The first thing that stands out is that syscalls are *much* more expensive on -current. Nearly a factor of 4 for a null syscall (0.57us -> 2.06 us). I suppose it equates to a latency of ~0.35us for each mutex taken/released. Can this be right? The lmbench null syscall is getppid: # ./lat_syscall null Simple syscall: 2.0178 microseconds # ./lat_syscall null Simple syscall: 2.0333 microseconds # sysctl -w kern.giant.proc=0 kern.giant.proc: 1 -> 0 # ./lat_syscall null Simple syscall: 1.6360 microseconds # ./lat_syscall null Simple syscall: 1.6333 microseconds Is the locking overhead this bad on x86? It looks downright embarrassing on alpha. Can anything be done about it? Are the memory barriers in atomic_cmpset_acq_* really needed? They have the look of belt & suspenders code.. FWIW, The appended diff to remove them reducess null system call latency to 1.6us with kern.giant.proc=1, and 1.4us with kern.giant.proc=0. I'm about to start a buildworld with it, but I don't have any SMP boxes. On the other hand, the pipe results are tremendous. Pipe now goes like a bat out of hell. Congrats to whoever did that. Drew L M B E N C H 2 . 0 S U M M A R Y ------------------------------------ Basic system parameters ---------------------------------------------------- Host OS Description Mhz --------- ------------- ----------------------- ---- monet FreeBSD 4.5-S alpha-freebsd4.5 497 monet FreeBSD 5.0-C alpha-freebsd5.0 497 monet OSF1 V5.1 alphaev6-dec-osf5.1 499 Processor, Processes - times in microseconds - smaller is better ---------------------------------------------------------------- Host OS Mhz null null open selct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ---- monet FreeBSD 4.5-S 497 0.57 3.78 16.2 22.5 33.7 1.29 6.25 1434 4692 12.K monet FreeBSD 5.0-C 497 2.06 7.52 28.1 44.1 18.7 2.35 8.91 1440 4716 8895 monet OSF1 V5.1 499 0.41 1.33 144. 157. 19.8 1.01 4.56 1039 2745 7611 Context switching - times in microseconds - smaller is better ------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ----- ------ ------ ------ ------ ------- ------- monet FreeBSD 4.5-S 1.850 20.4 98.1 36.9 139.2 39.7 139.7 monet FreeBSD 5.0-C 4.250 13.6 47.1 22.9 62.2 27.2 68.3 monet OSF1 V5.1 5.430 9.8900 45.2 18.1 49.5 20.2 53.6 *Local* Communication latencies in microseconds - smaller is better ------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- monet FreeBSD 4.5-S 1.850 12.0 14.4 38.5 79.1 43.0 100.0 231. monet FreeBSD 5.0-C 4.250 28.4 33.5 87.5 300. monet OSF1 V5.1 5.430 24.8 48.7 90.3 152.8 84.8 197.4 File & VM system latencies in microseconds - smaller is better -------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page Create Delete Create Delete Latency Fault Fault --------- ------------- ------ ------ ------ ------ ------- ----- ----- monet FreeBSD 4.5-S 80.4 57.7 2457.0 1243.8 2980.0 0.527 monet FreeBSD 5.0-C 120.2 99.4 330.3 4854.4 4688.0 0.145 monet OSF1 V5.1 488.3 855.4 1773.0 1083.4 1462.0 2.931 4470.0 *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- monet FreeBSD 4.5-S 82.3 201. 70.2 159.1 953.5 452.6 342.3 953. 351.2 monet FreeBSD 5.0-C 429. 127. 102. 250.9 946.6 439.3 339.3 947. 350.7 monet OSF1 V5.1 301. 237. 318.1 979.5 451.7 350.1 978. 356.7 Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) --------------------------------------------------- Host OS Mhz L1 $ L2 $ Main mem Guesses --------- ------------- ---- ----- ------ -------- ------- monet FreeBSD 4.5-S 497 6.016 30.1 196.6 monet FreeBSD 5.0-C 497 6.039 30.3 197.7 monet OSF1 V5.1 499 5.859 29.3 194.3 Index: atomic.h =================================================================== RCS file: /home/ncvs/src/sys/alpha/include/atomic.h,v retrieving revision 1.14 diff -u -r1.14 atomic.h --- atomic.h 17 May 2002 05:45:39 -0000 1.14 +++ atomic.h 4 Sep 2002 20:37:43 -0000 @@ -419,14 +419,14 @@ int retval; retval = atomic_cmpset_32(p, cmpval, newval); - alpha_mb(); +/* alpha_mb();*/ return (retval); } static __inline u_int32_t atomic_cmpset_rel_32(volatile u_int32_t *p, u_int32_t cmpval, u_int32_t newval) { - alpha_mb(); +/* alpha_mb();*/ return (atomic_cmpset_32(p, cmpval, newval)); } @@ -436,14 +436,14 @@ int retval; retval = atomic_cmpset_64(p, cmpval, newval); - alpha_mb(); +/* alpha_mb();*/ return (retval); } static __inline u_int64_t atomic_cmpset_rel_64(volatile u_int64_t *p, u_int64_t cmpval, u_int64_t newval) { - alpha_mb(); +/* alpha_mb();*/ return (atomic_cmpset_64(p, cmpval, newval)); } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15734.29725.515274.183629>