Date: Thu, 1 Nov 2007 21:34:17 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Christoph Mallon <christoph.mallon@gmx.de> Cc: Andrey Chernov <ache@nagual.pp.ru>, src-committers@FreeBSD.org, Juli Mallett <juli@clockworksquid.com>, cvs-all@FreeBSD.org, cvs-src@FreeBSD.org Subject: Re: cvs commit: src/include _ctype.h Message-ID: <20071101204931.C94960@besplex.bde.org> In-Reply-To: <47292F79.9030102@gmx.de> References: <200710272232.l9RMWSbK072082@repoman.freebsd.org> <20071030200331.GA29309@toxic.magnesium.net> <20071031215526.GC89932@nagual.pp.ru> <47292F79.9030102@gmx.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 1 Nov 2007, Christoph Mallon wrote: > Andrey Chernov wrote: >> On Tue, Oct 30, 2007 at 10:03:31AM -1000, Juli Mallett wrote: >>> * "Andrey A. Chernov" <ache@FreeBSD.org> [ 2007-10-27 ] >>> [ cvs commit: src/include _ctype.h ] >>>> ache 2007-10-27 22:32:28 UTC >>>> >>>> FreeBSD src repository >>>> >>>> Modified files: >>>> include _ctype.h Log: >>>> Micro-optimization of prev. commit, change >>>> (_c < 0 || _c >= 128) to (_c & ~0x7F) >>> Isn't that a non-optimization in code and a minor pessimization of >>> readability? >> ... >> For ones who doubts there two tests compiled with -O2. As you may see the >> result is almost identical (andl vs cmpl): We never doubted that it was a small negative or non-optimization :-). Look closer and you will see that the andl version takes 2 extra instructions, since both versions are smart enough to avoid a branch, and for this they need the result of the condition code generated by the andl or cmpl, and the cmpl generates the desired condition code directly while 2 more instructions are needed after the andl. >> -------------------- a.c -------------------- >> main () { >> >> int c; >> >> return (c & ~0x7f) ? 0 : c * 2; >> } This example has many flaws as pointed out by Cristoph: - c is uninitialized - the result depends on c in a way that is quite different than the table lookup for ctype. The above expression happens to be more optimizable. >> -------------------- a.s -------------------- >> .file "a.c" >> .text >> .p2align 4,,15 >> .globl main >> .type main, @function >> main: >> leal 4(%esp), %ecx >> andl $-16, %esp >> pushl -4(%ecx) >> movl %eax, %edx <--- extra instruction since andl clobbers a register. Normally, testl should be used to avoid this clobber. >> andl $-128, %edx <--- this sets %edx to something and also sets the condition codes, but not like we want >> addl %eax, %eax <--- c * 2 >> cmpl $1, %edx <--- this sets the condition codes like we want >> sbbl %edx, %edx <--- turn condition codes into a mask in %edx: mask = 0xffffffff if the result should be c *2 and mask = 0 if the result should be 0 >> pushl %ebp >> andl %edx, %eax <--- result = (c * 2) & mask >> movl %esp, %ebp <--- why is it bothering to set up a frame this late? >> pushl %ecx >> popl %ecx >> popl %ebp >> leal -4(%ecx), %esp >> ret >> .size main, .-main >> .ident "GCC: (GNU) 4.2.1 20070719 [FreeBSD]" >> -------------------- a1.c -------------------- >> main () { >> >> int c; >> >> return (c < 0 || c >= 128) ? 0 : c * 2; >> >> >> } >> -------------------- a1.s -------------------- >> .file "a1.c" >> .text >> .p2align 4,,15 >> .globl main >> .type main, @function >> main: >> leal 4(%esp), %ecx >> andl $-16, %esp >> pushl -4(%ecx) >> addl %eax, %eax >> cmpl $128, %eax <--- cmpl puts result in condition codes directly where we want it >> sbbl %edx, %edx <--- same masking stuff ... >> andl %edx, %eax >> pushl %ebp >> movl %esp, %ebp >> pushl %ecx >> popl %ecx >> popl %ebp >> leal -4(%ecx), %esp >> ret >> .size main, .-main >> .ident "GCC: (GNU) 4.2.1 20070719 [FreeBSD]" > > Your example is invalid. The value of c is undefined in this function and you > see random garbage as result (for example in the code snippet you see the c * > 2 (addl %eax, %eax) and after that is the cmpl, which uses %eax, too). In > fact it would be perfectly legal for the compiler to always return 0, call > abort(), or let demons fly out of your nose. However, the uninitialized c = %eax seems to be transformed correctly in both cases. The first case even preserves %eax from the andl. > > Also the example is still unrealistic: You usually don't multiply chars by > two. Lets try something more realistic: an ASCII filter Indeed. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071101204931.C94960>