Date: Tue, 5 Feb 2002 18:59:10 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Peter Jeremy <peter.jeremy@alcatel.com.au> Cc: Michal Mertl <mime@traveller.cz>, <audit@FreeBSD.ORG>, <jhb@FreeBSD.ORG> Subject: Re: request for review: i386 64bit atomic patch Message-ID: <20020205183716.D25833-100000@gamplex.bde.org> In-Reply-To: <20020205071853.H72285@gsmx07.alcatel.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 5 Feb 2002, Peter Jeremy wrote:
> On 2002-Feb-04 12:24:44 +0100, Michal Mertl <mime@traveller.cz> wrote:
> >+ " setz %%al ; "
> >+ " movzbl %%al,%0 ; "
>
> "%%al" could alternatively be written as "%b0" (From i386.md: 'b' =>
> print the QImode name of the register for the indicated operand).
It's good not to use hard register names all over, but even better not
to use them at all. I changed the corresponding code in atomic_cmpset_int
to:
" setz %0 ; "
" /* no movzbl here */
My %0 has type u_char, so 'b' and the movzbl pessimization are not
needed, and it has the constraint "=r", so that gcc can pick a better
register for it if possible. Most of 'b's elsewhere in atomic.h are
also unnecessary. A couple are necessary to convert expressions
involving u_char's back to u_char and the otheres are for superstitious
reasons.
> >+ "#atomic_cmpset_64"
> >+ : "=a" (res), /* 0 */
>
> The above change means that the "=a" could safely become "=q", which
> would allow gcc the freedom to pick something other than %eax. As a
> straight function, this doesn't help, but it may help for an inline.
Right. In practice, I found that gcc -O usually made a mess of this
optimization. gcc -O2 does better without the optimization hints in
my version.
> >+ : "a" ((u_long)exp),
> >+ "d" ((u_long)((u_long *)&exp)[1]),
> >+ "b" ((u_long)src),
> >+ "c" ((u_long)((u_long *)&src)[1])
>
> I'd change these "u_long"s to "u_int32_t" or "u_register_t". This
> will stop the code breaking on bde's "long has 64 bits" test system.
> You might like to compare the code for "(u_long)((u_long *)&exp)[1]"
> with "(u_long)(exp >> 32)".
Don't worry about this. I haven't fixed the atomic operations on longs
on i386's with 64-bit longs, since this would be nontrivial (it basically
requires Michal's 64-bit support) and I don't really want i386's with
64-bit longs to use longs for counters :-).
If we use register_t then we really shouldn't make assumptions about its
size. Writing asms is difficult without such assumptions.
> >Thank Peter and John for your advices. What do you think now? Can someone
> >commit it?
>
> I don't see anything obviously wrong. Have you tested it with both
> gcc 2.95.1 and 3.x at various different optimisations? Constraint
> incompatibilities cause enough problems that we don't want to add any
> more.
gcc-3 is reported to not like the "+a" to "=r" (output) and "a" (input)
conversion that I made.
The patch only fixes the !i386 case. I have written but not tested the
corresponding change for the i386 case.
%%%
Index: atomic.h
===================================================================
RCS file: /home/ncvs/src/sys/i386/include/atomic.h,v
retrieving revision 1.24
diff -u -1 -r1.24 atomic.h
--- atomic.h 18 Dec 2001 08:51:34 -0000 1.24
+++ atomic.h 3 Feb 2002 03:26:30 -0000
@@ -141,17 +133,48 @@
{
- int res = exp;
+ u_int exp_copy;
+ u_char result;
+ /*
+ * The optimization hints prevent gcc-2.95 -O from preferring to
+ * keep exp in a register that would be better used for `result'.
+ * This function is often used to build spinloops, and gcc does
+ * not know that spinloops are usually iterated only once so that
+ * keeping loop variables in registers is usually a pessimization.
+ * However, gcc-2.95 -O2 somehow avoids the pessimizations without
+ * these hints. I hope gcc-3 -O doesn't need them.
+ */
+ exp_copy = exp; /* Optimization hint. */
+#if 0
+ /*
+ * Here is a version without the bogus memory clobber. We don't
+ * use it (yet?) since it causes "invalid constraint in asm" errors
+ * for i386's with 64-bit longs. It also causes larger code for
+ * all i386's, so it is not clear that it is better.
+ */
+ __asm __volatile (
+ " " MPLOCKED_STR " ; "
+ " cmpxchgl %2,%1 ; "
+ " setz %0 ; "
+ "1: "
+ "# atomic_cmpset_int"
+ : "=r" (result), /* 0 */
+ "+m" (*dst) /* 1 */
+ : "r" (src), /* 2 */
+ "a" (exp_copy)); /* 3 */
+#else
__asm __volatile (
- " " MPLOCKED " "
+ " " MPLOCKED_STR " ; "
" cmpxchgl %1,%2 ; "
- " setz %%al ; "
- " movzbl %%al,%0 ; "
+ " setz %0 ; "
"1: "
"# atomic_cmpset_int"
- : "+a" (res) /* 0 (result) */
+ : "=r" (result) /* 0 */
: "r" (src), /* 1 */
- "m" (*(dst)) /* 2 */
+ "m" (*dst), /* 2 */
+ "a" (exp_copy) /* 3 */
: "memory");
+#endif
+ exp_copy = 0xDEAD; /* Another optimization hint. */
- return (res);
+ return (result);
}
%%%
Bruce
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-audit" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020205183716.D25833-100000>
