Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Jul 2011 10:16:53 -0700
From:      Marcel Moolenaar <marcel@xcllnt.net>
To:        Roman Divacky <rdivacky@freebsd.org>
Cc:        svn-src-projects@freebsd.org, Marcel Moolenaar <marcel@freebsd.org>, src-committers@freebsd.org
Subject:   Re: svn commit: r223705 - projects/llvm-ia64/lib/clang/libllvmjit
Message-ID:  <A72DE71C-3166-4816-B90C-E04523CC1622@xcllnt.net>
In-Reply-To: <20110701165151.GA6877@freebsd.org>
References:  <201107010329.p613Tn8s071270@svn.freebsd.org> <20110701084224.GA43291@freebsd.org> <00211D6B-F882-43C1-9D93-5ED2D72C5132@xcllnt.net> <20110701165151.GA6877@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Jul 1, 2011, at 9:51 AM, Roman Divacky wrote:

>> The following open items are on my mind:
>> 
>> 1.  On ia64, function prologues allocate a register frame that has
>>    enough (stacked) registers for local scratch registers and 
>>    outgoing function arguments. This means that I need to know
>>    (after register allocation) how many (unique) scratch registers
>>    are in use and what the largest number of arguments that need
>>    to be passed in registers to children (the max being 8). Without
>>    this information the compiler is forced to allocate the maximum
>>    size (which is 96 stacked registers, of which 8 are outgoing).
>>    This obviously eats into the register stack and probably causes
>>    runtime failures on deep call chains.
> 
> I recommend you to do this little experiment (on amd64 or so):

*snip*

> # Machine code for function foo:
> Frame Objects:
>  fi#0: size=4, align=4, at location [SP+8]
> Function Live Ins: %EDI in %vreg0
> 
> I believe this is what you asked.

*snip*

I'm not sure we're in sync.

The general registers on ia64 are split in 2:
1.	r0-r31		static registers
2.	r32-r127	stacked registers

The purpose of the stacked registers is to optimize function
calls by having the CPU manage a rotating register file and
an engine that flushes "dirty" register to memory. All a
function has to do is tell the CPU how many stacked registers
it wants (max 96) and the CPU will handle all the pushing and
popping on function call entry and exit so to speak.

Before register allocation one can assume the max: 96 registers,
of which 8 are for argument passing. This gives 88 preserved
(non-scratch) registers.

Emitting code where every function allocates the max is really
bad, so after register allocation you want to adjust the alloc
with the actual number of registers used by the function.

Second or third order is analyzing the behaviour of the above.
If the register allocator is really wasteful, the above yields
register frames that are generally too big. More back pressure
is needed to get more optimal code.

Anyway: I don't know yet how to get the actual number of
(stacked) registers used for locals and outgoing arguments, so
that's an open item. It probably means adding a function pass
that runs after register allocation to scan the function and
then adjust the prologue code.

FYI,

-- 
Marcel Moolenaar
marcel@xcllnt.net





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A72DE71C-3166-4816-B90C-E04523CC1622>