From owner-freebsd-security  Tue Jan 25 10:56: 1 2000
Delivered-To: freebsd-security@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id DA4F315117
	for <security@FreeBSD.ORG>; Tue, 25 Jan 2000 10:55:40 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id KAA05770;
	Tue, 25 Jan 2000 10:55:38 -0800 (PST)
	(envelope-from dillon)
Date: Tue, 25 Jan 2000 10:55:38 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200001251855.KAA05770@apollo.backplane.com>
To: Brett Glass <brett@lariat.org>
Cc: Warner Losh <imp@village.org>, security@FreeBSD.ORG
Subject: Re: Merged patches 
References: <4.2.2.20000125095042.01a5aba0@localhost>
 <200001251722.KAA04527@harmony.village.org> <4.2.2.20000125113518.01a59100@localhost>
Sender: owner-freebsd-security@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

:>    So we do multiple tests, so what?  Not only will GCC potentially
:>     optimize the code, 
:
:I have never seen GCC optimize tests of the individual bits of a word
:into a switch.

    Because a 'switch' table lookup is often more expensive then a sequence
    of conditionals.

:>but doing multiple tests means the memory references 
:>     are already in the L1 cache so, frankly, I doubt you would save more
:>     then a few nanoseconds glomming it all together into a switch.  
:
:Caching isn't the issue. Conditional jumps trigger pipeline interlocks
:and stalls. A bunch of them in a row is a worst case. It locks up even the 
:best superscalar CPUs because the pipelines are tied in knots and you can only
:do so much speculative execution. Doing a switch eliminates the pipeline
:"train wreck" and at the same time parallelizes the tests in a completely 
:portable way. As an ASM programmer, I see MASSIVE speedups when I do this --
:usually an order of magnitude at least. 

    This is not true if the conditionals are ordered for the critical path.
    This is especially not true on the i386 architecture which implements 
    nearly 0-cost branches in the branch-cache case.

:If the compiler generates a jump table (which you can force via an option in
:many cases but which a good compiler will do on its own), all of the
:paths become short. The cost is fixed: one indexed jump. Because there's
:only one jump, branch prediction, speculative execution, etc. work on newer 
:CPUs. The penalty is smaller on the older ones, too.
:...
:--Brett

    This is not necessarily true.  A jump table usually destroys the 
    branch-cache for the main branch-to-jump-table and can be slower 
    then a sequence of conditionals that have been ordered for the critical
    path.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-security" in the body of the message