Date: Wed, 14 Sep 2016 22:09:49 +0200 From: Emmanuel Vadot <manu@bidouilliste.com> To: Bruce Evans <brde@optusnet.com.au> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r305814 - head/sys/boot/common Message-ID: <20160914220949.bfbd8562ec180ab690b84b60@bidouilliste.com> In-Reply-To: <20160915042715.N2557@besplex.bde.org> References: <201609141743.u8EHhX7R038840@repo.freebsd.org> <20160915042715.N2557@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Bruce, On Thu, 15 Sep 2016 05:37:19 +1000 (EST) Bruce Evans <brde@optusnet.com.au> wrote: > On Wed, 14 Sep 2016, Emmanuel Vadot wrote: > > > Log: > > ufsread: Do not cast struct direct from void * > > This cause alignment problem on ARM (and possibly other archs), instead copy it. > > > > MFC after: 1 week > > > > Modified: > > head/sys/boot/common/ufsread.c > > > > Modified: head/sys/boot/common/ufsread.c > > ============================================================================== > > --- head/sys/boot/common/ufsread.c Wed Sep 14 16:47:17 2016 (r305813) > > +++ head/sys/boot/common/ufsread.c Wed Sep 14 17:43:32 2016 (r305814) > > @@ -97,21 +97,21 @@ static __inline uint8_t > > fsfind(const char *name, ufs_ino_t * ino) > > { > > static char buf[DEV_BSIZE]; > > - struct direct *d; > > + static struct direct d; > > char *s; > > ssize_t n; > > This looks like a good pessimization for space. boot2 on i386 has to > fit in 8192 bytes and has a negative number to spare (features are > already left out). Do you have any suggestion on making the code better ? This was the last patch for having EFI working on ARMv6 and this is something that I want to be enabled by default at some point. > > With auto buffers, and also builtin memcpy, the compiler can optimize > the memcpy to nothing on arches with no alignment, and should do this > with -Os. I think -ffreestanding in boot programs kills the builtin, > and this is almost intentional since compilers have poor support for > -Os so the inline memcpy is sometimes larger unless it is null. > > > > > fs_off = 0; > > while ((n = fsread(*ino, buf, DEV_BSIZE)) > 0) > > for (s = buf; s < buf + DEV_BSIZE;) { > > - d = (void *)s; > > + memcpy(&d, s, sizeof(struct direct)); > > if (ls) > > - printf("%s ", d->d_name); > > - else if (!strcmp(name, d->d_name)) { > > - *ino = d->d_ino; > > - return d->d_type; > > + printf("%s ", d.d_name); > > + else if (!strcmp(name, d.d_name)) { > > + *ino = d.d_ino; > > + return d.d_type; > > } > > - s += d->d_reclen; > > + s += d.d_reclen; > > } > > if (n != -1 && ls) > > printf("\n"); > > The static buffer in the old code also looks like a pessimization for > space. It doesn't seem to be needed to preserver or return results. > Compilers don't seem to be smart enough to see this and optimize it > to auto or vice versa. > > Testing shows that the static buffer is space optimization for gcc > and a space pessimization for clang: > > X #include <string.h> > X #include <unistd.h> > X > X int > X foo(void) > X { > X static char buf[101]; > X static int buf1[3]; > X > X read(0, buf, sizeof(buf)); > X memcpy(buf1, &buf[2], sizeof(buf1)); > X return (buf[0] + buf1[0]); > X } > > 2 static buffers give the best pessimizations for clang -Os -m32. > Even with -Os, clang prefers to use large instructions to reduce the > number of instructions. The static buffers give largest instructions, > and also prevent optimizing away the memcpy(). With auto buf1, it > takes -ffreestanding to kill the optimization of memcpy() and with > that optimization the size is reduced by almost a factor of 2. With > auto buf too, other pessimizations are applied to give a size reduction > of just 1 byte. gcc-4.2.1 never inlines the memcpy(), but it does > the other pessimizations less well to give an in-between size. > > Aproximate sizes on i386: 65 bytes for clang with statics, 49 for gcc, > 33 for clang with autos and builtins, 28 for tweaked output for clang > with static buf, auto buf1 and builtins (null memcpy). > > Probably the static buf is an intentional optimization for gcc. > Compilers do too good a job of generating large code for large stack > offsets. In my test code, the offsets are about 101 which is less than > 128 so it takes only 2 bytes. DEV_BSIZE is 512 so it needs 5-byte > offsets from the usual pessimizations for the base pointer (these > are to point %ebp at the top of the variables and put all the scalar > variables below the buffer so that the offsets are largest). But > clang also does good pessimizations for offsets from the static buffer: > > pushl $foo.buf # 5 bytes > movsbl foo.buf, %eax # 7 bytes > addl foo.buf+2, %eax # 6 bytes > > Actually optimizing for space: > > movl $foo.buf,%ebx # 5 bytes > pushl %ebx # 1 byte > movsbl (%ebx), %eax # 3 bytes > addl 2(%ebx), %eax # 3 bytes > > Bruce Cheers, -- Emmanuel Vadot
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160914220949.bfbd8562ec180ab690b84b60>