Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Feb 2002 18:59:10 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Peter Jeremy <peter.jeremy@alcatel.com.au>
Cc:        Michal Mertl <mime@traveller.cz>, <audit@FreeBSD.ORG>, <jhb@FreeBSD.ORG>
Subject:   Re: request for review: i386 64bit atomic patch
Message-ID:  <20020205183716.D25833-100000@gamplex.bde.org>
In-Reply-To: <20020205071853.H72285@gsmx07.alcatel.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 5 Feb 2002, Peter Jeremy wrote:

> On 2002-Feb-04 12:24:44 +0100, Michal Mertl <mime@traveller.cz> wrote:
> >+	"	setz %%al ;		"
> >+	"	movzbl	%%al,%0 ;	"
>
> "%%al" could alternatively be written as "%b0" (From i386.md: 'b' =>
> print the QImode name of the register for the indicated operand).

It's good not to use hard register names all over, but even better not
to use them at all.  I changed the corresponding code in atomic_cmpset_int
to:

	"	setz %0 ;		"
	"	/* no movzbl here */

My %0 has type u_char, so 'b' and the movzbl pessimization are not
needed, and it has the constraint "=r", so that gcc can pick a better
register for it if possible.  Most of 'b's elsewhere in atomic.h are
also unnecessary.  A couple are necessary to convert expressions
involving u_char's back to u_char and the otheres are for superstitious
reasons.

> >+	"#atomic_cmpset_64"
> >+	: "=a" (res),				/* 0 */
>
> The above change means that the "=a" could safely become "=q", which
> would allow gcc the freedom to pick something other than %eax.  As a
> straight function, this doesn't help, but it may help for an inline.

Right.  In practice, I found that gcc -O usually made a mess of this
optimization.  gcc -O2 does better without the optimization hints in
my version.

> >+	: "a" ((u_long)exp),
> >+	  "d" ((u_long)((u_long *)&exp)[1]),
> >+	  "b" ((u_long)src),
> >+	  "c" ((u_long)((u_long *)&src)[1])
>
> I'd change these "u_long"s to "u_int32_t" or "u_register_t".  This
> will stop the code breaking on bde's "long has 64 bits" test system.
> You might like to compare the code for "(u_long)((u_long *)&exp)[1]"
> with "(u_long)(exp >> 32)".

Don't worry about this.  I haven't fixed the atomic operations on longs
on i386's with 64-bit longs, since this would be nontrivial (it basically
requires  Michal's 64-bit support) and I don't really want i386's with
64-bit longs to use longs for counters :-).

If we use register_t then we really shouldn't make assumptions about its
size.  Writing asms is difficult without such assumptions.

> >Thank Peter and John for your advices. What do you think now? Can someone
> >commit it?
>
> I don't see anything obviously wrong.  Have you tested it with both
> gcc 2.95.1 and 3.x at various different optimisations?  Constraint
> incompatibilities cause enough problems that we don't want to add any
> more.

gcc-3 is reported to not like the "+a" to "=r" (output) and "a" (input)
conversion that I made.

The patch only fixes the !i386 case.  I have written but not tested the
corresponding change for the i386 case.

%%%
Index: atomic.h
===================================================================
RCS file: /home/ncvs/src/sys/i386/include/atomic.h,v
retrieving revision 1.24
diff -u -1 -r1.24 atomic.h
--- atomic.h	18 Dec 2001 08:51:34 -0000	1.24
+++ atomic.h	3 Feb 2002 03:26:30 -0000
@@ -141,17 +133,48 @@
 {
-	int res = exp;
+	u_int exp_copy;
+	u_char result;

+	/*
+	 * The optimization hints prevent gcc-2.95 -O from preferring to
+	 * keep exp in a register that would be better used for `result'.
+	 * This function is often used to build spinloops, and gcc does
+	 * not know that spinloops are usually iterated only once so that
+	 * keeping loop variables in registers is usually a pessimization.
+	 * However, gcc-2.95 -O2 somehow avoids the pessimizations without
+	 * these hints.  I hope gcc-3 -O doesn't need them.
+	 */
+	exp_copy = exp;			/* Optimization hint. */
+#if 0
+	/*
+	 * Here is a version without the bogus memory clobber.  We don't
+	 * use it (yet?) since it causes "invalid constraint in asm" errors
+	 * for i386's with 64-bit longs.  It also causes larger code for
+	 * all i386's, so it is not clear that it is better.
+	 */
+	__asm __volatile (
+	"	" MPLOCKED_STR " ;	"
+	"	cmpxchgl %2,%1 ;	"
+	"       setz	%0 ;		"
+	"1:				"
+	"# atomic_cmpset_int"
+	: "=r" (result),		/* 0 */
+	  "+m" (*dst)			/* 1 */
+	: "r" (src),			/* 2 */
+	  "a" (exp_copy));		/* 3 */
+#else
 	__asm __volatile (
-	"	" MPLOCKED "		"
+	"	" MPLOCKED_STR " ;	"
 	"	cmpxchgl %1,%2 ;	"
-	"       setz	%%al ;		"
-	"	movzbl	%%al,%0 ;	"
+	"       setz	%0 ;		"
 	"1:				"
 	"# atomic_cmpset_int"
-	: "+a" (res)			/* 0 (result) */
+	: "=r" (result)			/* 0 */
 	: "r" (src),			/* 1 */
-	  "m" (*(dst))			/* 2 */
+	  "m" (*dst),			/* 2 */
+	  "a" (exp_copy)		/* 3 */
 	: "memory");
+#endif
+	exp_copy = 0xDEAD;		/* Another optimization hint. */

-	return (res);
+	return (result);
 }
%%%

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-audit" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020205183716.D25833-100000>