Date: Thu, 25 Nov 2004 15:06:45 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: John Baldwin <jhb@FreeBSD.org> Cc: ia64@FreeBSD.org Subject: Re: Patch to optimize "bare" critical sections Message-ID: <Pine.NEB.3.96L.1041125140630.39883B-100000@fledge.watson.org> In-Reply-To: <200411231631.00945.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 23 Nov 2004, John Baldwin wrote: > On Tuesday 23 November 2004 03:00 pm, John Baldwin wrote: > > Basically, I have a patch to divorce the interrupt disable/deferring to > > only happen inside of spinlocks using a new spinlock_enter/exit() API > > (where a spinlock_enter/exit includes a critical section as well) but that > > plain critical sections won't have to do such a thing. I've tested it on > > i386, alpha, and sparc64 already, and it has also been tested on arm. I'm > > unable to get a cross-built powerpc kernel to link (linker dies with a > > signal 6), but the compile did finish. I have cross-compiled ia64 and > > amd64 > > successfully, but have not run tested due to ENOHARDWARE. So, I would > > appreciate it if a few folks could try the patch out on ppc, ia64, and > > amd64 to make sure it works ok. Thanks. > > > > http://www.FreeBSD.org/~jhb/spinlock.patch > > *cough* Ahem, http://www.FreeBSD.org/~jhb/patches/spinlock.patch FYI, I'm seeing a fairly solid wedge occuring under stress with the i386 patch in place on a dual Xeon test box in the Netperf cluster. I thought at first it was a property of the UMA optimizations I have that use the critical sections, but it also happens with just the critical section changes, so... :-) The reproduction mode I'm using is to run the syscall_timing tool on the box over a serial console repeatedly: http://www.watson.org/~robert/freebsd/syscall_timing.c In particular, I'm running 10,000 iterations of the socket create/free test. Under normal circumstances it looks like this: tiger-2# while (1) while? ./syscall_timing 10000 socket | grep per while? end 0.000006708 per/iteration 0.000006642 per/iteration 0.000006658 per/iteration 0.000006660 per/iteration ... ^C When I get the wedge it does this: tiger-2# while (1) while? ./syscall_timing 10000 socket | grep per while? end 0.000006735 per/iteration 0.000006772 per/iteration 0.000006721 per/iteration 0.000006744 per/iteration ... 0.000006716 per/iteration 0.000006710 per/iteration 0.000006745 per/ <-- hung It could well be associated with poor timing involving a clock or serial interrupt. I haven't made much headway at investigating it yet, and it looks like serial break is of no help, but will attempt to see what I can do this afternoon. I suspect without NMI on the box in question it will be dificult. I haven't yet tried with a UP kernel, however, only SMP. That said, with the critical section optimization in place and moving UMA to using critical sections rather than mutexes for the per-CPU cache on SMP, I see a small but heathy performance improvement in the socket create/destroy micro-benchmark: x netperf-socket-smp + percpu-socket-smp +--------------------------------------------------------------------------+ | + x | | ++ x xxx| |+ + ++++ + xxxxx| | |____A____| |AM|| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 6.64e-06 6.676e-06 6.666e-06 6.6601e-06 1.2359881e-08 + 10 6.078e-06 6.236e-06 6.172e-06 6.165e-06 4.0734915e-08 Difference at 95.0% confidence -4.951e-07 +/- 2.82825e-08 -7.43382% +/- 0.424655% (Student's t, pooled s = 3.01007e-08) Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041125140630.39883B-100000>