Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Feb 2015 17:53:11 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Pedro Giffuni <pfg@freebsd.org>
Cc:        src-committers@freebsd.org, Ian Lepore <ian@freebsd.org>, svn-src-all@freebsd.org, Gleb Smirnoff <glebius@freebsd.org>, Bruce Evans <brde@optusnet.com.au>, svn-src-head@freebsd.org
Subject:   Re: svn commit: r278737 - head/usr.sbin/flowctl
Message-ID:  <20150215162553.L977@besplex.bde.org>
In-Reply-To: <54DFA7CC.20305@FreeBSD.org>
References:  <201502132357.t1DNvKda075915@svn.freebsd.org>  <20150214193210.N945@besplex.bde.org> <20150214181508.GL15484@FreeBSD.org> <1423938828.80968.148.camel@freebsd.org> <54DFA7CC.20305@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 14 Feb 2015, Pedro Giffuni wrote:

> On 02/14/15 13:33, Ian Lepore wrote:
>> On Sat, 2015-02-14 at 21:15 +0300, Gleb Smirnoff wrote:
>>> On Sat, Feb 14, 2015 at 08:46:58PM +1100, Bruce Evans wrote:
>>> B> Using VLAs and also the C99 feature of declarations anwhere, and 
>>> extensions
>>> B> like __aligned(), we can almost implement a full alloca() using the 
>>> fixed
>>> B> version of this change:
>>> B>
>>> B> /*
>>> B>   * XXX need extended statement-expression so that __buf doesn't go out
>>> B>   * of scope after the right brace.
>>> B>   */
>>> B> #define	my_alloca(n) __extension__ ({
>>> B>  	/* XXX need unique name. */				\
>>> B>  	char __buf[__roundup2((n), MUMBLE)] __aligned(MUMBLE);	\
>>> B>  								\
>>> B>  	(void *)__buf;						\
>>> B> })
>>> 
>>> I like this idea. But would this exact code work? The life of
>>> __buf is limited by the code block, and we exit the block
>>> immediately. Wouldn't the allocation be overwritten if we
>>> enter any function or block later?

I don't know how to do it.  The comment describes the problem.  C99
doesn't require the block, but the statement-expression does.

There is another scope problem with alloca().  I think the storage
allocated by it is live until the end of the function, so it doesn't
go out of scope if alloca() is called in an inner block.

   (This is not properly documented in the FreeBSD manpage.  It is
   stated that the space is freed on return, but it is not stated that
   the space is not freed earlier.

   This is not properly documented in a Linux manpage found on the web.
   This manpage has an almost identical DESCRIPTION section.  Then it
   is better, except it doesn't spell RETURN VALUES' name with an S.

   FreeBSD's RETURN VALUES section is seriously broken.  It says that
   NULL is returned on failure.  But that is only for the extern libc
   version which is almost unreachable.  Normally the builtin is used.
   The Linux man page states only the behaviour on error of the builtin.
   It is that the behaviour is undefined on stack overflow.

   The Linux manpage then has much larger STANDARDS and HISTORY sections
   (spelled CONFORMING TO and NOTES).  These also deprecate it, and give
   some reasons.)

VLAs and macros cannot duplicate alloca()'s scope behaviour if the
macro or declaration is placed in an inner block.  This is not a problem
for FreeBSD, since style(9) forbids placing declarations in inner
blocks and no one would break that rule :-).  'ptr = alloca(n);' isn't
a declaration, but placing it in the outermost block is even more
useful for making ptr and its related space visible.

>> Why put any effort into avoiding alloca() in the first place?  Is it
>> inefficient on some platforms?  On arm it's like 5 instructions, it just
>> adjusts the size to keep the stack dword-aligned and subtracts the
>> result from sp, done.

It should be more like 0 instructions relative to a local array.  It
does take 0 more on x86 with clang, but not with gcc.  Even gcc48 on
amd64 still does pessimal stack alignment and more for alloca().
Tested with 'void test(void *);' and:

 	test(alloca(2048));
vs
 	int arr[1024]; test(arr);

gcc produces an extra instruction or 2 to align the stack.  Hmm, the
clang code is actually broken, at least on i386.  It needs to do the
stack alignment even more than clang, due to to its non-pessimal
alignment for the usual case.  Apparently, the stack is always 16-byte
aligned on amd64 although this is excessive.  On i386, the stack is
16-byte aligned by default for gcc although this is pessimal.  This
can be changed by -mpreferred-stack boundary=N.  For clang, the
stack is only 4-byte aligned, and -mpreferred-stack-boundary is
broken (not supported).  clang is supposed to do alignment as necessary.
That is, almost never.  It does the stack adjustment for doubles, but
not for alloca() or even for long doubles:

 	double d; test(&d);		/* adjusted */
 	test(alloca(8));		/* broken */
 	long double d; test(&d);	/* broken */

On i386, gcc depends on the default for doubles and long doubles (and
more importantly, for alignment directives and SSE variables), so it
never needs to adjust for alloca(), the same as on amd64,  but always
does it.

The stack allocation for multiple alloca()s or declarations (even ones
in inner blocks), should be coalesced and done at the start of a
function.  gcc but not clang pessimizes this too.  For alloca(8);
alloca(8); on both amd64 and i386, gcc generates 2 separate allocations
of 32 (?) bytes each with null (?) adjustments for each.

Of course, variable stack allocations cannot be coalesced before the
variables are known.  Handling the stack for this case requires more
care.  For example, the original stack pointer must be saved, since
subtraction to restore it cannot be used.  Similarly if the stack is
adjusted using andl.  Allocations may be intentionally delayed to
avoid wasting stack space, but this doesn't work for alloca() since
the allocations are required to live (as if) until the end of the
function.  It also tends not to work for fixed-size variable allocations,
due to optimizations.  It might work for VLAs in inner blocks, depending
on whether the compiler optimizes for time over space by delaying the
deallocation.  Compilers now track variable lifetimes and could
deallocate even ones in outer scope to optimize for space, but rarely do.

> Because it's non-standard and the alloca(3) man page discourages it:
> _____
> ...
> BUGS
> The alloca() function is machine and compiler dependent; its use is dis-
> couraged.

This became out of date with VLAs in C99.  Except for scopes, compilers
must have slightly more complications to support VLAs than alloca().
They might still not support alloca().  But FreeBSD never used ones that
don't.  That it would never use them was not so clear when this man page
was written.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150215162553.L977>