Date: Wed, 3 Dec 2008 00:09:55 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Kostik Belousov <kostikbel@gmail.com> Cc: Scott Long <scottl@samsco.org>, src-committers@freebsd.org, Kip Macy <kmacy@freebsd.org>, John Baldwin <jhb@freebsd.org>, svn-src-all@freebsd.org, jfv@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r185162 - in head: . sys/amd64/include sys/arm/include sys/conf sys/dev/bce sys/dev/cxgb sys/dev/cxgb/sys sys/dev/cxgb/ulp/iw_cxgb sys/dev/mxge sys/dev/nxge sys/i386/include sys/i386/in... Message-ID: <20081202232222.N1514@besplex.bde.org> In-Reply-To: <20081201214217.GS3045@deviant.kiev.zoral.com.ua> References: <200811220555.mAM5tuIJ007781@svn.freebsd.org> <3c1674c90811221651u338294frcdbd99b386733851@mail.gmail.com> <20081123154138.GS6408@deviant.kiev.zoral.com.ua> <200812011407.06563.jhb@freebsd.org> <20081201214217.GS3045@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 1 Dec 2008, Kostik Belousov wrote: > On Mon, Dec 01, 2008 at 02:07:06PM -0500, John Baldwin wrote: >> On Sunday 23 November 2008 10:41:38 am Kostik Belousov wrote: >>> Below is the updated patch. It includes changes made after private comments >>> by bde@ and uses symbolic definitions for the bits in the features words. >>> I thought about accessing a per-CPU word for serialized instruction in the >>> slow path, but decided that it does not beneficial.\ >> >> Is the branch really better than just doing what the atomic operations for >> mutexes, etc. do and just use 'lock addl $0,%esp' for a barrier in all cases >> on i386 and only bother with using the fancier instructions on amd64? Even >> amd64 doesn't use *fence yet for the atomic ops actually. I have had a patch >> to use it for years, but during testing there was no discernable difference >> between the existing 'lock addl' approach vs '*fence'. I'd much rather just >> use 486 code for all i386 machines than add a branch, esp. if >> the "optimization" the branch is doing isn't an actual optimization. Hmm. > I do not insist. The branch is done only for arches that has no fence > instructions. The section for the slow code is put after the main text, > that should, according to Intel documentation, be predicted as not taken > for the first execution. > > For reference, below is the latest version of the patch. I postponed > the commit, in particular, because it touches ixgb, and Jack did > not responded. > ... > diff --git a/sys/i386/include/atomic.h b/sys/i386/include/atomic.h > index f6bcf0c..dbdc945 100644 > --- a/sys/i386/include/atomic.h > +++ b/sys/i386/include/atomic.h > @@ -32,21 +32,47 @@ > #error this file needs sys/cdefs.h as a prerequisite > #endif > > - > -#if defined(I686_CPU) > -#define mb() __asm__ __volatile__ ("mfence;": : :"memory") > -#define wmb() __asm__ __volatile__ ("sfence;": : :"memory") > -#define rmb() __asm__ __volatile__ ("lfence;": : :"memory") > -#else > -/* > - * do we need a serializing instruction? > - */ > -#define mb() > -#define wmb() > -#define rmb() > +#if defined(_KERNEL) Still have this style bug (if defined() instead of ifdef). > +#include <machine/cpufunc.h> Namespace pollution. This should be unnecessary because it is an error to include <machine/atomic.h> (or <machine/cpufunc.h>) directly, at least in the kernel. The correct include is <sys/systm.h> which supplies pollution by <machine/atomic.h> and <machine/cpufunc.h> and lots more as standard. There <machine/atomic.h> is included before <machine/cpufunc.h>. This order would now be wrong if you have added a dependency of <machine/atomic.h> on <machine/cpufunc.h>, but I can't see such a dependency... > +#include <machine/specialreg.h> ... now I see it -- it is because <machine/specialreg.h> is broken and has a dependency on <machine/cpufunc.h>. <machine/specialreg.h> is supposed to contain only #define's for special registers, but it has 2 inline functions that use functions from <machine/cpufunc.h>, without even using #defines for the special registers that they use. Normally <machine/specialreg.h> can be included without worrying about this dependency, thanks to the standard pollution, but now the standard pollution is broken. Namespace pollution. This is almost necessary for CPUID_* (it would be necessary if these macros were inlines, but with macros only files that use the macros would need the include and there might be few enough of them for this to be done for each, or <sys/systm.h> could do it (ugh) and hthe order doesn't matter. Another advantage of 'lock addl' is that CPUID_* are not needed. Atomic ops would need the same treatment if they used *fence. > +#define mb() __asm __volatile( \ > + "testl %0,cpu_feature \n\ > + je 2f \n\ > + mfence \n\ > +1: \n\ Still have this style bug (no empty line before label that is not fallen into). I wasn't going to care about this, but then decided that the non-fall-through here is especially non-obvious. > + .section .text.offpath \n\ > +2: lock \n\ Still have this style bug (label on same line as instruction). > + addl $0,cpu_feature \n\ I wonder if adding to (%esp) is faster. It is at least smaller. > + jmp 1b \n\ Missing empty line as above. > + .text \n\ > +" \ > + : : "i"(CPUID_SSE2) : "memory") Missing space after "i" here and elsewhere. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081202232222.N1514>