Date: Tue, 5 Feb 2002 18:59:10 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Peter Jeremy <peter.jeremy@alcatel.com.au> Cc: Michal Mertl <mime@traveller.cz>, <audit@FreeBSD.ORG>, <jhb@FreeBSD.ORG> Subject: Re: request for review: i386 64bit atomic patch Message-ID: <20020205183716.D25833-100000@gamplex.bde.org> In-Reply-To: <20020205071853.H72285@gsmx07.alcatel.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 5 Feb 2002, Peter Jeremy wrote: > On 2002-Feb-04 12:24:44 +0100, Michal Mertl <mime@traveller.cz> wrote: > >+ " setz %%al ; " > >+ " movzbl %%al,%0 ; " > > "%%al" could alternatively be written as "%b0" (From i386.md: 'b' => > print the QImode name of the register for the indicated operand). It's good not to use hard register names all over, but even better not to use them at all. I changed the corresponding code in atomic_cmpset_int to: " setz %0 ; " " /* no movzbl here */ My %0 has type u_char, so 'b' and the movzbl pessimization are not needed, and it has the constraint "=r", so that gcc can pick a better register for it if possible. Most of 'b's elsewhere in atomic.h are also unnecessary. A couple are necessary to convert expressions involving u_char's back to u_char and the otheres are for superstitious reasons. > >+ "#atomic_cmpset_64" > >+ : "=a" (res), /* 0 */ > > The above change means that the "=a" could safely become "=q", which > would allow gcc the freedom to pick something other than %eax. As a > straight function, this doesn't help, but it may help for an inline. Right. In practice, I found that gcc -O usually made a mess of this optimization. gcc -O2 does better without the optimization hints in my version. > >+ : "a" ((u_long)exp), > >+ "d" ((u_long)((u_long *)&exp)[1]), > >+ "b" ((u_long)src), > >+ "c" ((u_long)((u_long *)&src)[1]) > > I'd change these "u_long"s to "u_int32_t" or "u_register_t". This > will stop the code breaking on bde's "long has 64 bits" test system. > You might like to compare the code for "(u_long)((u_long *)&exp)[1]" > with "(u_long)(exp >> 32)". Don't worry about this. I haven't fixed the atomic operations on longs on i386's with 64-bit longs, since this would be nontrivial (it basically requires Michal's 64-bit support) and I don't really want i386's with 64-bit longs to use longs for counters :-). If we use register_t then we really shouldn't make assumptions about its size. Writing asms is difficult without such assumptions. > >Thank Peter and John for your advices. What do you think now? Can someone > >commit it? > > I don't see anything obviously wrong. Have you tested it with both > gcc 2.95.1 and 3.x at various different optimisations? Constraint > incompatibilities cause enough problems that we don't want to add any > more. gcc-3 is reported to not like the "+a" to "=r" (output) and "a" (input) conversion that I made. The patch only fixes the !i386 case. I have written but not tested the corresponding change for the i386 case. %%% Index: atomic.h =================================================================== RCS file: /home/ncvs/src/sys/i386/include/atomic.h,v retrieving revision 1.24 diff -u -1 -r1.24 atomic.h --- atomic.h 18 Dec 2001 08:51:34 -0000 1.24 +++ atomic.h 3 Feb 2002 03:26:30 -0000 @@ -141,17 +133,48 @@ { - int res = exp; + u_int exp_copy; + u_char result; + /* + * The optimization hints prevent gcc-2.95 -O from preferring to + * keep exp in a register that would be better used for `result'. + * This function is often used to build spinloops, and gcc does + * not know that spinloops are usually iterated only once so that + * keeping loop variables in registers is usually a pessimization. + * However, gcc-2.95 -O2 somehow avoids the pessimizations without + * these hints. I hope gcc-3 -O doesn't need them. + */ + exp_copy = exp; /* Optimization hint. */ +#if 0 + /* + * Here is a version without the bogus memory clobber. We don't + * use it (yet?) since it causes "invalid constraint in asm" errors + * for i386's with 64-bit longs. It also causes larger code for + * all i386's, so it is not clear that it is better. + */ + __asm __volatile ( + " " MPLOCKED_STR " ; " + " cmpxchgl %2,%1 ; " + " setz %0 ; " + "1: " + "# atomic_cmpset_int" + : "=r" (result), /* 0 */ + "+m" (*dst) /* 1 */ + : "r" (src), /* 2 */ + "a" (exp_copy)); /* 3 */ +#else __asm __volatile ( - " " MPLOCKED " " + " " MPLOCKED_STR " ; " " cmpxchgl %1,%2 ; " - " setz %%al ; " - " movzbl %%al,%0 ; " + " setz %0 ; " "1: " "# atomic_cmpset_int" - : "+a" (res) /* 0 (result) */ + : "=r" (result) /* 0 */ : "r" (src), /* 1 */ - "m" (*(dst)) /* 2 */ + "m" (*dst), /* 2 */ + "a" (exp_copy) /* 3 */ : "memory"); +#endif + exp_copy = 0xDEAD; /* Another optimization hint. */ - return (res); + return (result); } %%% Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-audit" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020205183716.D25833-100000>