Date: Wed, 25 Mar 2015 21:15:24 +0100 From: Mateusz Guzik <mjguzik@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r280407 - head/sys/kern Message-ID: <20150325201524.GB14280@dft-labs.eu> In-Reply-To: <20150324135350.N1665@besplex.bde.org> References: <201503240010.t2O0ACZb089906@svn.freebsd.org> <20150324135350.N1665@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 24, 2015 at 03:58:14PM +1100, Bruce Evans wrote: > On Tue, 24 Mar 2015, Mateusz Guzik wrote: > > >Log: > > filedesc: microoptimize fget_unlocked by getting rid of fd < 0 branch > > This has no effect. Compilers optimize to the equivalent of the the > unsigned cast hack if this is good. On x86, it is good since no > instructions are needed for the conversion, and the only difference > between the code generated by > > if (fd >= fdt->fdt_nfiles) > > and > > if (fd < 0 || fd >= fdt->fdt_nfiles) > > is to change from jl (jump if less than) to jb (jump if below) (both > jumps over the if clause). Negative values satisfy jl but not jb. > I would not commit the change if it did not affect generated assembly at least on amd64. if (fd < 0 || fd >= fdt->fdt_nfiles): 0xffffffff807d147d <fget_unlocked+13>: sub $0x38,%rsp 0xffffffff807d1481 <fget_unlocked+17>: test %esi,%esi 0xffffffff807d1483 <fget_unlocked+19>: js 0xffffffff807d15b8 <fget_unlocked+328> 0xffffffff807d1489 <fget_unlocked+25>: mov (%rdi),%rbx 0xffffffff807d148c <fget_unlocked+28>: cmp %esi,(%rbx) 0xffffffff807d148e <fget_unlocked+30>: jle 0xffffffff807d15bf <fget_unlocked+335> if ((u_int)fd >= fdt->fdt_nfiles): 0xffffffff807d147d <fget_unlocked+13>: sub $0x38,%rsp 0xffffffff807d1481 <fget_unlocked+17>: mov (%rdi),%rbx 0xffffffff807d1484 <fget_unlocked+20>: cmp %esi,(%rbx) 0xffffffff807d1486 <fget_unlocked+22>: jbe 0xffffffff807d15a8 <fget_unlocked+312> I did not check other archs prior to the commit. This is clang 3.6 as present in head. Sources compiled with -O2. Also see below for other compiler test. > > Casting fd to an unsigned type simplifies fd range coparison to mere checking > > if the result is bigger than the table. > > No, it obfuscates the range comparison. > It is a standard hack which is hard to misread and which seems to add a slight benefit (see below). > On some arches, conversion to unsigned is slow. Then compilers should > optimize in the opposite direction by first undoing the bogus cast to > get back to the range check and then optimizing the range check using > the best strategy. Compilers should probably first undo the bogus > cast even on x86, so as to reduce to the range check case. Range checks > are more important and more uniform than bogus casts, so they are more > likely to be optimized. Similarly if fd has type u_int to begin with. It affects assembly on all arm, powerpc64 and mips64 as well. Both arm and powerpc just get rid of zero-test and use the same instructions to perform the other comparison. I only found a difference on mips64 which used sltu instead of slt (but still got rid of the zero-check). Granted I don't know mips instruction costs and I don't have the real hadrware to benchark on. Seems like a win for most architectures anyway. > > >Modified: head/sys/kern/kern_descrip.c > >============================================================================== > >--- head/sys/kern/kern_descrip.c Tue Mar 24 00:01:30 2015 (r280406) > >+++ head/sys/kern/kern_descrip.c Tue Mar 24 00:10:11 2015 (r280407) > >@@ -2342,7 +2342,7 @@ fget_unlocked(struct filedesc *fdp, int > >#endif > > > > fdt = fdp->fd_files; > >- if (fd < 0 || fd >= fdt->fdt_nfiles) > >+ if ((u_int)fd >= fdt->fdt_nfiles) > > return (EBADF); > > /* > > * Fetch the descriptor locklessly. We avoid fdrop() races by > [..] > - fget_locked() seems to be unchanged recently, so it doesn't have the > obfuscation. fget_unlocked() is newer than the unobfuscated version > of fget_locked(). It is in a different file, but may have copied the > unobfuscated range check from fget_locked(). This commit restores the > obfuscation to it alone. > This definitely should be synced one way or the other, thanks for pointing this out. > There are now some other range checks in kern_desc.c that are missing > the obfuscation: grepping for "fd <" gives: > - a magic treatment for negative fd's in closefrom() Well I'll start with a short note that I don't know what's up with uap->lowfd = 0; instead of retuning with EINVAL. Anyay, the cast there would not have any use. > - the range check in an unobfuscated by confusing form in fdisused(). > The check has to be inverted since it is in a KASSERT(), and the > inversion is done by reversing the inequalities. Correct, this likely should also be synced (one way or the other). > - similarly in fdalloc(). > Same. > I used to use this hack a lot 30 years ago, but stopped when compilers > got better 20-25 years ago. > I wrote a toy program and checked e.g. gcc5 and it still did not optimise zero-test away. In fact I would argue the optimisation in question is impossible unless upper limit check is against a constant in (0, INT_MAX) range or against a var whose range is known at compile time. In particular this is problematic for negative values. Consider: int fd, limit; ............ if (fd < 0 || fd >= limit) Let's have fd = -5 and limit = -4. Should the fd < 0 check be dropped by the compiler and the expression turned into (u_int)fd >= limit, the coparison would be false thus changing the outcome. As such, I prefer to keep the cast. -- Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150325201524.GB14280>