Date: Thu, 15 Sep 2016 05:37:19 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Emmanuel Vadot <manu@freebsd.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r305814 - head/sys/boot/common Message-ID: <20160915042715.N2557@besplex.bde.org> In-Reply-To: <201609141743.u8EHhX7R038840@repo.freebsd.org> References: <201609141743.u8EHhX7R038840@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 14 Sep 2016, Emmanuel Vadot wrote: > Log: > ufsread: Do not cast struct direct from void * > This cause alignment problem on ARM (and possibly other archs), instead copy it. > > MFC after: 1 week > > Modified: > head/sys/boot/common/ufsread.c > > Modified: head/sys/boot/common/ufsread.c > ============================================================================== > --- head/sys/boot/common/ufsread.c Wed Sep 14 16:47:17 2016 (r305813) > +++ head/sys/boot/common/ufsread.c Wed Sep 14 17:43:32 2016 (r305814) > @@ -97,21 +97,21 @@ static __inline uint8_t > fsfind(const char *name, ufs_ino_t * ino) > { > static char buf[DEV_BSIZE]; > - struct direct *d; > + static struct direct d; > char *s; > ssize_t n; This looks like a good pessimization for space. boot2 on i386 has to fit in 8192 bytes and has a negative number to spare (features are already left out). With auto buffers, and also builtin memcpy, the compiler can optimize the memcpy to nothing on arches with no alignment, and should do this with -Os. I think -ffreestanding in boot programs kills the builtin, and this is almost intentional since compilers have poor support for -Os so the inline memcpy is sometimes larger unless it is null. > > fs_off = 0; > while ((n = fsread(*ino, buf, DEV_BSIZE)) > 0) > for (s = buf; s < buf + DEV_BSIZE;) { > - d = (void *)s; > + memcpy(&d, s, sizeof(struct direct)); > if (ls) > - printf("%s ", d->d_name); > - else if (!strcmp(name, d->d_name)) { > - *ino = d->d_ino; > - return d->d_type; > + printf("%s ", d.d_name); > + else if (!strcmp(name, d.d_name)) { > + *ino = d.d_ino; > + return d.d_type; > } > - s += d->d_reclen; > + s += d.d_reclen; > } > if (n != -1 && ls) > printf("\n"); The static buffer in the old code also looks like a pessimization for space. It doesn't seem to be needed to preserver or return results. Compilers don't seem to be smart enough to see this and optimize it to auto or vice versa. Testing shows that the static buffer is space optimization for gcc and a space pessimization for clang: X #include <string.h> X #include <unistd.h> X X int X foo(void) X { X static char buf[101]; X static int buf1[3]; X X read(0, buf, sizeof(buf)); X memcpy(buf1, &buf[2], sizeof(buf1)); X return (buf[0] + buf1[0]); X } 2 static buffers give the best pessimizations for clang -Os -m32. Even with -Os, clang prefers to use large instructions to reduce the number of instructions. The static buffers give largest instructions, and also prevent optimizing away the memcpy(). With auto buf1, it takes -ffreestanding to kill the optimization of memcpy() and with that optimization the size is reduced by almost a factor of 2. With auto buf too, other pessimizations are applied to give a size reduction of just 1 byte. gcc-4.2.1 never inlines the memcpy(), but it does the other pessimizations less well to give an in-between size. Aproximate sizes on i386: 65 bytes for clang with statics, 49 for gcc, 33 for clang with autos and builtins, 28 for tweaked output for clang with static buf, auto buf1 and builtins (null memcpy). Probably the static buf is an intentional optimization for gcc. Compilers do too good a job of generating large code for large stack offsets. In my test code, the offsets are about 101 which is less than 128 so it takes only 2 bytes. DEV_BSIZE is 512 so it needs 5-byte offsets from the usual pessimizations for the base pointer (these are to point %ebp at the top of the variables and put all the scalar variables below the buffer so that the offsets are largest). But clang also does good pessimizations for offsets from the static buffer: pushl $foo.buf # 5 bytes movsbl foo.buf, %eax # 7 bytes addl foo.buf+2, %eax # 6 bytes Actually optimizing for space: movl $foo.buf,%ebx # 5 bytes pushl %ebx # 1 byte movsbl (%ebx), %eax # 3 bytes addl 2(%ebx), %eax # 3 bytes Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160915042715.N2557>