Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Sep 2016 22:09:49 +0200
From:      Emmanuel Vadot <manu@bidouilliste.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r305814 - head/sys/boot/common
Message-ID:  <20160914220949.bfbd8562ec180ab690b84b60@bidouilliste.com>
In-Reply-To: <20160915042715.N2557@besplex.bde.org>
References:  <201609141743.u8EHhX7R038840@repo.freebsd.org> <20160915042715.N2557@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

 Hi Bruce,

On Thu, 15 Sep 2016 05:37:19 +1000 (EST)
Bruce Evans <brde@optusnet.com.au> wrote:

> On Wed, 14 Sep 2016, Emmanuel Vadot wrote:
> 
> > Log:
> >  ufsread: Do not cast struct direct from void *
> >  This cause alignment problem on ARM (and possibly other archs), instead copy it.
> >
> >  MFC after:	1 week
> >
> > Modified:
> >  head/sys/boot/common/ufsread.c
> >
> > Modified: head/sys/boot/common/ufsread.c
> > ==============================================================================
> > --- head/sys/boot/common/ufsread.c	Wed Sep 14 16:47:17 2016	(r305813)
> > +++ head/sys/boot/common/ufsread.c	Wed Sep 14 17:43:32 2016	(r305814)
> > @@ -97,21 +97,21 @@ static __inline uint8_t
> > fsfind(const char *name, ufs_ino_t * ino)
> > {
> > 	static char buf[DEV_BSIZE];
> > -	struct direct *d;
> > +	static struct direct d;
> > 	char *s;
> > 	ssize_t n;
> 
> This looks like a good pessimization for space.  boot2 on i386 has to
> fit in 8192 bytes and has a negative number to spare (features are
> already left out).

 Do you have any suggestion on making the code better ?
 This was the last patch for having EFI working on ARMv6 and this is
something that I want to be enabled by default at some point.

> 
> With auto buffers, and also builtin memcpy, the compiler can optimize
> the memcpy to nothing on arches with no alignment, and should do this
> with -Os.  I think -ffreestanding in boot programs kills the builtin,
> and this is almost intentional since compilers have poor support for
> -Os so the inline memcpy is sometimes larger unless it is null.
> 
> >
> > 	fs_off = 0;
> > 	while ((n = fsread(*ino, buf, DEV_BSIZE)) > 0)
> > 		for (s = buf; s < buf + DEV_BSIZE;) {
> > -			d = (void *)s;
> > +			memcpy(&d, s, sizeof(struct direct));
> > 			if (ls)
> > -				printf("%s ", d->d_name);
> > -			else if (!strcmp(name, d->d_name)) {
> > -				*ino = d->d_ino;
> > -				return d->d_type;
> > +				printf("%s ", d.d_name);
> > +			else if (!strcmp(name, d.d_name)) {
> > +				*ino = d.d_ino;
> > +				return d.d_type;
> > 			}
> > -			s += d->d_reclen;
> > +			s += d.d_reclen;
> > 		}
> > 	if (n != -1 && ls)
> > 		printf("\n");
> 
> The static buffer in the old code also looks like a pessimization for
> space.  It doesn't seem to be needed to preserver or return results.
> Compilers don't seem to be smart enough to see this and optimize it
> to auto or vice versa.
> 
> Testing shows that the static buffer is space optimization for gcc
> and a space pessimization for clang:
> 
> X #include <string.h>
> X #include <unistd.h>
> X 
> X int
> X foo(void)
> X {
> X 	static char buf[101];
> X 	static int buf1[3];
> X 
> X 	read(0, buf, sizeof(buf));
> X 	memcpy(buf1, &buf[2], sizeof(buf1));
> X 	return (buf[0] + buf1[0]);
> X }
> 
> 2 static buffers give the best pessimizations for clang -Os -m32.
> Even with -Os, clang prefers to use large instructions to reduce the
> number of instructions.  The static buffers give largest instructions,
> and also prevent optimizing away the memcpy().  With auto buf1, it
> takes -ffreestanding to kill the optimization of memcpy() and with
> that optimization the size is reduced by almost a factor of 2.  With
> auto buf too, other pessimizations are applied to give a size reduction
> of just 1 byte.  gcc-4.2.1 never inlines the memcpy(), but it does
> the other pessimizations less well to give an in-between size.
> 
> Aproximate sizes on i386: 65 bytes for clang with statics, 49 for gcc,
> 33 for clang with autos and builtins, 28 for tweaked output for clang
> with static buf, auto buf1 and builtins (null memcpy).
> 
> Probably the static buf is an intentional optimization for gcc.
> Compilers do too good a job of generating large code for large stack
> offsets.  In my test code, the offsets are about 101 which is less than
> 128 so it takes only 2 bytes.  DEV_BSIZE is 512 so it needs 5-byte
> offsets from the usual pessimizations for the base pointer (these
> are to point %ebp at the top of the variables and put all the scalar
> variables below the buffer so that the offsets are largest).  But
> clang also does good pessimizations for offsets from the static buffer:
> 
>  	pushl	$foo.buf	# 5 bytes
>  	movsbl	foo.buf, %eax	# 7 bytes
>  	addl	foo.buf+2, %eax	# 6 bytes
> 
> Actually optimizing for space:
> 
>  	movl	$foo.buf,%ebx	# 5 bytes
>  	pushl	%ebx		# 1 byte
>  	movsbl	(%ebx), %eax	# 3 bytes
>  	addl	2(%ebx), %eax	# 3 bytes
> 
> Bruce

 Cheers,

-- 
Emmanuel Vadot



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160914220949.bfbd8562ec180ab690b84b60>