From owner-freebsd-current Mon Jul 12 20:21:38 1999 Delivered-To: freebsd-current@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id E0A2414C2F for ; Mon, 12 Jul 1999 20:21:30 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id UAA73713; Mon, 12 Jul 1999 20:21:23 -0700 (PDT) (envelope-from dillon) Date: Mon, 12 Jul 1999 20:21:23 -0700 (PDT) From: Matthew Dillon Message-Id: <199907130321.UAA73713@apollo.backplane.com> To: Mike Smith Cc: Mike Smith , Mike Haertel , Luoqi Chen , dfr@nlsystems.com, jeremyp@gsmx07.alcatel.com.au, freebsd-current@FreeBSD.ORG Subject: Re: "objtrm" problem probably found (was Re: Stuck in "objtrm") References: <199907130246.TAA03519@dingo.cdrom.com> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :I assumed too much in asking the question; I was specifically :interested in indirect function calls, since this has a direct impact :on method-style implementations. Branch prediction caches are typically PC-sensitive. An indirect method call will never be as fast as a direct call, but if the indirect address is the same the branch prediction cache will work. If the indirect address changes at the PC where the call is being made, the branch cache may create a penalty. Try this core in one of the cases to that test program, and add two nop subroutines void nop1(void) { } and void nop2(void) { }. Compile this code without any optimizations! *no* optimizations or the test will not demonstrate the problem :-) In this case the branch prediction succeeds because the indirect address does not change at the PC where func() is called. I get 34 ns per loop. { void (*func)(void) = nop1; for (i = 0; i < LOOPS; ++i) { func(); if (i & 1) func = nop1; else func = nop1; } } In this case the branch prediction fails because the indirect address is different at the PC each time func() is called. I get 61ns. { void (*func)(void) = nop1; for (i = 0; i < LOOPS; ++i) { func(); if (i & 1) func = nop1; else func = nop2; } } In this case we simulate a mix. (i & 1) -> (i & 7). I get 47 ns. { void (*func)(void) = nop1; for (i = 0; i < LOOPS; ++i) { func(); if (i & 7) func = nop1; else func = nop2; } } Ok, so what does this mean for method calls? If the method call is INLINED, then the branch prediction cache will tend to work because the method call will tend to call the same address at any given PC. If the method call is doubly-indirect, where a routine is called which calculates the method address and then calls it, the branch prediction cache will tend to fail because a different address will tend to be called at the PC of the call. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message