Date: Tue, 25 Jan 2000 13:31:20 -0700 From: Brett Glass <brett@lariat.org> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Warner Losh <imp@village.org>, security@FreeBSD.ORG Subject: Re: Merged patches Message-ID: <4.2.2.20000125132053.01a5edb0@localhost> In-Reply-To: <200001251855.KAA05770@apollo.backplane.com> References: <4.2.2.20000125095042.01a5aba0@localhost> <200001251722.KAA04527@harmony.village.org> <4.2.2.20000125113518.01a59100@localhost>
next in thread | previous in thread | raw e-mail | index | archive | help
At 11:55 AM 1/25/2000 , Matthew Dillon wrote: >:> So we do multiple tests, so what? Not only will GCC potentially >:> optimize the code, >: >:I have never seen GCC optimize tests of the individual bits of a word >:into a switch. > > Because a 'switch' table lookup is often more expensive then a sequence > of conditionals. No -- because it doesn't know how. If you cast the series of tests as a switch, the compiler is capable of rendering the code either as a sequence of conditionals or as a jump table. (In most cases, it will recognize the efficiency of a jump table and use one, but if there is some reason why a jump table would NOT be more efficient, it will also recognize that.) If you use individual tests, the compiler won't be able to pick the best method for you. :Caching isn't the issue. Conditional jumps trigger pipeline interlocks >:and stalls. A bunch of them in a row is a worst case. It locks up even the >:best superscalar CPUs because the pipelines are tied in knots and you can only >:do so much speculative execution. Doing a switch eliminates the pipeline >:"train wreck" and at the same time parallelizes the tests in a completely >:portable way. As an ASM programmer, I see MASSIVE speedups when I do this -- >:usually an order of magnitude at least. > > This is not true if the conditionals are ordered for the critical path. > This is especially not true on the i386 architecture which implements > nearly 0-cost branches in the branch-cache case. This is a general TCP input routine. ALL packets enter here. The critical path is situational, so it pays to make all of the paths fast. Due to the way modern processors are architected, an indexed jump is as fast as a conditional jump and does not create dependencies or stall pipelines. So, the critical path is still optimal with a jump table in most cases. Again, if for some reason it is not on a specific architecture, the compiler can render the switch as a series of conditional jumps. > This is not necessarily true. A jump table usually destroys the > branch-cache for the main branch-to-jump-table and can be slower > then a sequence of conditionals that have been ordered for the critical > path. Again, you cannot order conditionals for "the" critical path here. And references to a jump table, because they do not rely on condition flags, can be pipelined and executed in parallel with other instructions. And the jump table itself enters the cache if used frequently. The net result: the code incurs less overhead than a conditional jump. I invite you to experiment with this yourself; as an author of optimized assembly language, I have. Or see Abrash, who notes the same result in his book "The Zen Of Assembly Language." --Brett Glass To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-security" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4.2.2.20000125132053.01a5edb0>