From owner-cvs-all@FreeBSD.ORG Wed Oct 31 22:30:34 2007 Return-Path: Delivered-To: cvs-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BF7E216A41A; Wed, 31 Oct 2007 22:30:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx03.syd.optusnet.com.au (fallbackmx03.syd.optusnet.com.au [211.29.133.136]) by mx1.freebsd.org (Postfix) with ESMTP id 55FE613C494; Wed, 31 Oct 2007 22:30:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by fallbackmx03.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with ESMTP id l9VGOhN7007806; Thu, 1 Nov 2007 03:24:43 +1100 Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l9VGNf3L022470 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 1 Nov 2007 03:23:42 +1100 Date: Thu, 1 Nov 2007 03:23:56 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Christoph Mallon In-Reply-To: <47264710.2000500@gmx.de> Message-ID: <20071101024451.T4289@delplex.bde.org> References: <200710272232.l9RMWSbK072082@repoman.freebsd.org> <47264710.2000500@gmx.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, "Andrey A. Chernov" , cvs-all@FreeBSD.org Subject: Re: cvs commit: src/include _ctype.h X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2007 22:30:34 -0000 On Mon, 29 Oct 2007, Christoph Mallon wrote: > Andrey A. Chernov wrote: >> ache 2007-10-27 22:32:28 UTC >> >> FreeBSD src repository >> >> Modified files: >> include _ctype.h Log: >> Micro-optimization of prev. commit, change >> (_c < 0 || _c >= 128) to (_c & ~0x7F) >> Revision Changes Path >> 1.33 +1 -1 src/include/_ctype.h > > Actually this is rather a micro-pessimisation. Every compiler worth its money > transforms the range check into single unsigned comparison. The latter test > on the other hand on x86 gets probably transformed into a test instruction. > This instruction has no form with sign extended 8bit immediate, but only with > 32bit immediate. This results in a significantly longer opcode (three bytes > more) than a single (unsigned)_c > 127, which a sane compiler produces. I > suspect some RISC machines need one more instruction for the > "micro-optimised" code, too. > In theory GCC could transform the _c & ~0x7F back into a (unsigned)_c > 127, > but it does not do this (the only compiler I found, which does this > transformation, is LLVM). > Further IMO it is hard to decipher what _c & ~0x7F is supposed to do. Indeed. In fact, one of the cleanups/optimizations in rev.1.5 and 1.6 by ache and me was to get rid of the mask. There was already a check for _c < 0, so the mask cost even more. The top limit was 256 instead of 128, so the point about 8bit immediates didn't apply, but I don't know of any machines where the mask is faster (didn't look hard :-). OTOH, _c is often a char or a u_char (it is declared as mumble_rune_t, but the functions are inline so the compiler can see the original type. If _c is u_char and u_char is uint8_t, then (_c < 0 || c >= 256) is always false, so the compiler should generate no code for it. The top limit of 256 was preferred so that this optimization is possible. A top limit of 128 doesn't work so well. I would have worried about the 1's complement case. I think a mask without a check for _c < 0 is plain broken in the 1's complement case, but this case is too hard to think about -- just do a range comparison which will always work, and let the compiler reduce it using 2's complement or 1's complement tricks if possible, but since 1's complement machines are rare, write the code so that it is easier for the compiler to optimize in the 2's complement case. Pipelining might make the old optimizations in ctype uninteresting. Maybe everything is almost free except for the table lookup (although that is cached, it will sometimes miss). I haven't timed this lately. Bruce