From owner-freebsd-current  Mon Jul 12 20:21:38 1999
Delivered-To: freebsd-current@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
	by hub.freebsd.org (Postfix) with ESMTP id E0A2414C2F
	for <freebsd-current@FreeBSD.ORG>; Mon, 12 Jul 1999 20:21:30 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id UAA73713;
	Mon, 12 Jul 1999 20:21:23 -0700 (PDT)
	(envelope-from dillon)
Date: Mon, 12 Jul 1999 20:21:23 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199907130321.UAA73713@apollo.backplane.com>
To: Mike Smith <mike@smith.net.au>
Cc: Mike Smith <mike@smith.net.au>, Mike Haertel <mike@ducky.net>,
	Luoqi Chen <luoqi@watermarkgroup.com>, dfr@nlsystems.com,
	jeremyp@gsmx07.alcatel.com.au, freebsd-current@FreeBSD.ORG
Subject: Re: "objtrm" problem probably found (was Re: Stuck in "objtrm") 
References:  <199907130246.TAA03519@dingo.cdrom.com>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


:I assumed too much in asking the question; I was specifically 
:interested in indirect function calls, since this has a direct impact 
:on method-style implementations.

    Branch prediction caches are typically PC-sensitive.  An indirect method
    call will never be as fast as a direct call, but if the indirect address
    is the same the branch prediction cache will work.

    If the indirect address changes at the PC where the call is being made,
    the branch cache may create a penalty.

    Try this core in one of the cases to that test program, and add two nop
    subroutines void nop1(void) { } and void nop2(void) { }.

    Compile this code without any optimizations!  *no* optimizations or
    the test will not demonstrate the problem :-)

    In this case the branch prediction succeeds because the indirect 
    address does not change at the PC where func() is called.  I get 34 ns
    per loop.

            {
                void (*func)(void) = nop1;
                for (i = 0; i < LOOPS; ++i) {
                        func();
                        if (i & 1)
                                func = nop1;
                        else
                                func = nop1;
                }
            }

    In this case the branch prediction fails because the indirect address
    is different at the PC each time func() is called.  I get 61ns.

            {
                void (*func)(void) = nop1;
                for (i = 0; i < LOOPS; ++i) {
                        func();
                        if (i & 1)
                                func = nop1;
                        else
                                func = nop2;
                }
            }

    In this case we simulate a mix.  (i & 1) -> (i & 7).  I get 47 ns.

            {
                void (*func)(void) = nop1;
                for (i = 0; i < LOOPS; ++i) {
                        func();
                        if (i & 7)
                                func = nop1;
                        else
                                func = nop2;
                }
            }

    Ok, so what does this mean for method calls?    If the method call is
    INLINED, then the branch prediction cache will tend to work because the
    method call will tend to call the same address at any given PC.  If
    the method call is doubly-indirect, where a routine is called which 
    calculates the method address and then calls it, the branch prediction
    cache will tend to fail because a different address will tend to be
    called at the PC of the call.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message