From owner-freebsd-security Tue Jan 25 10:56: 1 2000 Delivered-To: freebsd-security@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id DA4F315117 for ; Tue, 25 Jan 2000 10:55:40 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id KAA05770; Tue, 25 Jan 2000 10:55:38 -0800 (PST) (envelope-from dillon) Date: Tue, 25 Jan 2000 10:55:38 -0800 (PST) From: Matthew Dillon Message-Id: <200001251855.KAA05770@apollo.backplane.com> To: Brett Glass Cc: Warner Losh , security@FreeBSD.ORG Subject: Re: Merged patches References: <4.2.2.20000125095042.01a5aba0@localhost> <200001251722.KAA04527@harmony.village.org> <4.2.2.20000125113518.01a59100@localhost> Sender: owner-freebsd-security@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :> So we do multiple tests, so what? Not only will GCC potentially :> optimize the code, : :I have never seen GCC optimize tests of the individual bits of a word :into a switch. Because a 'switch' table lookup is often more expensive then a sequence of conditionals. :>but doing multiple tests means the memory references :> are already in the L1 cache so, frankly, I doubt you would save more :> then a few nanoseconds glomming it all together into a switch. : :Caching isn't the issue. Conditional jumps trigger pipeline interlocks :and stalls. A bunch of them in a row is a worst case. It locks up even the :best superscalar CPUs because the pipelines are tied in knots and you can only :do so much speculative execution. Doing a switch eliminates the pipeline :"train wreck" and at the same time parallelizes the tests in a completely :portable way. As an ASM programmer, I see MASSIVE speedups when I do this -- :usually an order of magnitude at least. This is not true if the conditionals are ordered for the critical path. This is especially not true on the i386 architecture which implements nearly 0-cost branches in the branch-cache case. :If the compiler generates a jump table (which you can force via an option in :many cases but which a good compiler will do on its own), all of the :paths become short. The cost is fixed: one indexed jump. Because there's :only one jump, branch prediction, speculative execution, etc. work on newer :CPUs. The penalty is smaller on the older ones, too. :... :--Brett This is not necessarily true. A jump table usually destroys the branch-cache for the main branch-to-jump-table and can be slower then a sequence of conditionals that have been ordered for the critical path. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-security" in the body of the message