Date: Fri, 1 Jul 2011 10:16:53 -0700 From: Marcel Moolenaar <marcel@xcllnt.net> To: Roman Divacky <rdivacky@freebsd.org> Cc: svn-src-projects@freebsd.org, Marcel Moolenaar <marcel@freebsd.org>, src-committers@freebsd.org Subject: Re: svn commit: r223705 - projects/llvm-ia64/lib/clang/libllvmjit Message-ID: <A72DE71C-3166-4816-B90C-E04523CC1622@xcllnt.net> In-Reply-To: <20110701165151.GA6877@freebsd.org> References: <201107010329.p613Tn8s071270@svn.freebsd.org> <20110701084224.GA43291@freebsd.org> <00211D6B-F882-43C1-9D93-5ED2D72C5132@xcllnt.net> <20110701165151.GA6877@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 1, 2011, at 9:51 AM, Roman Divacky wrote: >> The following open items are on my mind: >> >> 1. On ia64, function prologues allocate a register frame that has >> enough (stacked) registers for local scratch registers and >> outgoing function arguments. This means that I need to know >> (after register allocation) how many (unique) scratch registers >> are in use and what the largest number of arguments that need >> to be passed in registers to children (the max being 8). Without >> this information the compiler is forced to allocate the maximum >> size (which is 96 stacked registers, of which 8 are outgoing). >> This obviously eats into the register stack and probably causes >> runtime failures on deep call chains. > > I recommend you to do this little experiment (on amd64 or so): *snip* > # Machine code for function foo: > Frame Objects: > fi#0: size=4, align=4, at location [SP+8] > Function Live Ins: %EDI in %vreg0 > > I believe this is what you asked. *snip* I'm not sure we're in sync. The general registers on ia64 are split in 2: 1. r0-r31 static registers 2. r32-r127 stacked registers The purpose of the stacked registers is to optimize function calls by having the CPU manage a rotating register file and an engine that flushes "dirty" register to memory. All a function has to do is tell the CPU how many stacked registers it wants (max 96) and the CPU will handle all the pushing and popping on function call entry and exit so to speak. Before register allocation one can assume the max: 96 registers, of which 8 are for argument passing. This gives 88 preserved (non-scratch) registers. Emitting code where every function allocates the max is really bad, so after register allocation you want to adjust the alloc with the actual number of registers used by the function. Second or third order is analyzing the behaviour of the above. If the register allocator is really wasteful, the above yields register frames that are generally too big. More back pressure is needed to get more optimal code. Anyway: I don't know yet how to get the actual number of (stacked) registers used for locals and outgoing arguments, so that's an open item. It probably means adding a function pass that runs after register allocation to scan the function and then adjust the prologue code. FYI, -- Marcel Moolenaar marcel@xcllnt.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A72DE71C-3166-4816-B90C-E04523CC1622>