Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 May 2011 14:35:37 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Marius Strobl <marius@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r221913 - head/sys/dev/mii
Message-ID:  <20110515135737.B825@besplex.bde.org>
In-Reply-To: <201105142031.p4EKV5el014531@svn.freebsd.org>
References:  <201105142031.p4EKV5el014531@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 14 May 2011, Marius Strobl wrote:

> Log:
>  - There's no need for nibbletab to be static, it's const however.
>  - Fix whitespace.

There is every reason for it to be static.

> Modified: head/sys/dev/mii/mii.c
> ==============================================================================
> --- head/sys/dev/mii/mii.c	Sat May 14 19:36:12 2011	(r221912)
> +++ head/sys/dev/mii/mii.c	Sat May 14 20:31:04 2011	(r221913)
> @@ -552,7 +552,7 @@ mii_down(struct mii_data *mii)
> static unsigned char
> mii_bitreverse(unsigned char x)
> {
> -	static unsigned char nibbletab[16] = {
> +	unsigned const char const nibbletab[16] = {
> 		0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15
> 	};

Making it non-static asks the compiler to pessimize it by initializing
it on every entry to the function.  Some versions of gcc do a very good
job with this pessimization.  The normal pessimzation is to keep a
static copy of the array anyway, and copy this to the stack on every
entry to the function.  However, the pessimization seems to be broken
by gcc-4.3.1 in -current.  gcc-3.3.3 on i386 gives:

%%%
$ cat z.c
int
foo(void)
{
 	const unsigned char foo[16] = { 1, 2, 3, 4, };

 	test(foo[0], foo[1]);
}
$ gcc-3.3.3 -O2 -S z.c
$ cat z.S
 	.file	"z.c"
 	.text
 	.p2align 2,,3
.globl foo
 	.type	foo, @function
foo:
 	pushl	%ebp
 	movl	%esp, %ebp
 	pushl	%edi
 	subl	$28, %esp
 	leal	-24(%ebp), %edi
 	cld
 	xorl	%eax, %eax
 	movl	$4, %ecx
 	rep
 	stosl
 	pushl	$2
 	pushl	$1
 	movb	$1, -24(%ebp)
 	movb	$2, -23(%ebp)
 	movb	$3, -22(%ebp)
 	movb	$4, -21(%ebp)
 	call	test
 	movl	-4(%ebp), %edi
 	leave
 	ret
 	.size	foo, .-foo
 	.ident	"GCC: (GNU) 3.3.3 [FreeBSD] 20031106"
%%%

For this small array, gcc does an even better pessimization for space
than copying from a static copy.  It constructs the array entirely on
the stack.  First it bzeros 16 bytes on the stack using an inefficient
string instruction (so that the unused top 12 bytes of the array are
initialized).  Then uses individual stores to initialize the low 4
bytes of the array.  Now it knows that individual stores are faster
than string instructions for small data.  The low 4 bytes are unused
too.  The top 2 of these bytes are entirely unused, and the low 2 bytes
are optimized to constants at compile time.

Declaring the array as static breaks the pessimization for gcc-3.3.3
fairly well.  The code is then "pushl $2; pushl $1; call test".  The
breakage is not quite complete -- data for the unused static array is
still generated.

Using gcc-4.2.1 breaks the pessimization completely.  The code is then
"pushl $2; pushl $1; call test", and data for the unused array is not
generated, irrespective of whether the array is declared static or const.

Using -O instead of -O2 makes little difference.  Using -O0 fixes the
pessimization for gcc-4.2.1 and gives the minor additional pessimization
of loading the function parameters from the array.

Changing the array size from 16 to 1024 gives the normal pessimization
for gcc-3.3.3 of memcpy()'ing the array from a static copy.  The code is
then much more readable than the above, since it is not fully pessimized
for space.

This pessimization is not common in the kernel.  K&R compilers have
the feature that initialization of auto arrays is a syntax error.  Thus
old code doesn't even have nighmares about this pessimization.

I noticed this pessimization mainly in the old ata driver.  That has
lots of smallish arrays of data, and sqillions of function calls hidden
in access macros.  Stepping through the generated code using ddb was
very painful since what should be single memory accesses or constant
loads was a function call or three for the memcpy()'s and access
functions.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110515135737.B825>