Date: Sun, 15 Feb 2015 17:53:11 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Pedro Giffuni <pfg@freebsd.org> Cc: src-committers@freebsd.org, Ian Lepore <ian@freebsd.org>, svn-src-all@freebsd.org, Gleb Smirnoff <glebius@freebsd.org>, Bruce Evans <brde@optusnet.com.au>, svn-src-head@freebsd.org Subject: Re: svn commit: r278737 - head/usr.sbin/flowctl Message-ID: <20150215162553.L977@besplex.bde.org> In-Reply-To: <54DFA7CC.20305@FreeBSD.org> References: <201502132357.t1DNvKda075915@svn.freebsd.org> <20150214193210.N945@besplex.bde.org> <20150214181508.GL15484@FreeBSD.org> <1423938828.80968.148.camel@freebsd.org> <54DFA7CC.20305@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 14 Feb 2015, Pedro Giffuni wrote: > On 02/14/15 13:33, Ian Lepore wrote: >> On Sat, 2015-02-14 at 21:15 +0300, Gleb Smirnoff wrote: >>> On Sat, Feb 14, 2015 at 08:46:58PM +1100, Bruce Evans wrote: >>> B> Using VLAs and also the C99 feature of declarations anwhere, and >>> extensions >>> B> like __aligned(), we can almost implement a full alloca() using the >>> fixed >>> B> version of this change: >>> B> >>> B> /* >>> B> * XXX need extended statement-expression so that __buf doesn't go out >>> B> * of scope after the right brace. >>> B> */ >>> B> #define my_alloca(n) __extension__ ({ >>> B> /* XXX need unique name. */ \ >>> B> char __buf[__roundup2((n), MUMBLE)] __aligned(MUMBLE); \ >>> B> \ >>> B> (void *)__buf; \ >>> B> }) >>> >>> I like this idea. But would this exact code work? The life of >>> __buf is limited by the code block, and we exit the block >>> immediately. Wouldn't the allocation be overwritten if we >>> enter any function or block later? I don't know how to do it. The comment describes the problem. C99 doesn't require the block, but the statement-expression does. There is another scope problem with alloca(). I think the storage allocated by it is live until the end of the function, so it doesn't go out of scope if alloca() is called in an inner block. (This is not properly documented in the FreeBSD manpage. It is stated that the space is freed on return, but it is not stated that the space is not freed earlier. This is not properly documented in a Linux manpage found on the web. This manpage has an almost identical DESCRIPTION section. Then it is better, except it doesn't spell RETURN VALUES' name with an S. FreeBSD's RETURN VALUES section is seriously broken. It says that NULL is returned on failure. But that is only for the extern libc version which is almost unreachable. Normally the builtin is used. The Linux man page states only the behaviour on error of the builtin. It is that the behaviour is undefined on stack overflow. The Linux manpage then has much larger STANDARDS and HISTORY sections (spelled CONFORMING TO and NOTES). These also deprecate it, and give some reasons.) VLAs and macros cannot duplicate alloca()'s scope behaviour if the macro or declaration is placed in an inner block. This is not a problem for FreeBSD, since style(9) forbids placing declarations in inner blocks and no one would break that rule :-). 'ptr = alloca(n);' isn't a declaration, but placing it in the outermost block is even more useful for making ptr and its related space visible. >> Why put any effort into avoiding alloca() in the first place? Is it >> inefficient on some platforms? On arm it's like 5 instructions, it just >> adjusts the size to keep the stack dword-aligned and subtracts the >> result from sp, done. It should be more like 0 instructions relative to a local array. It does take 0 more on x86 with clang, but not with gcc. Even gcc48 on amd64 still does pessimal stack alignment and more for alloca(). Tested with 'void test(void *);' and: test(alloca(2048)); vs int arr[1024]; test(arr); gcc produces an extra instruction or 2 to align the stack. Hmm, the clang code is actually broken, at least on i386. It needs to do the stack alignment even more than clang, due to to its non-pessimal alignment for the usual case. Apparently, the stack is always 16-byte aligned on amd64 although this is excessive. On i386, the stack is 16-byte aligned by default for gcc although this is pessimal. This can be changed by -mpreferred-stack boundary=N. For clang, the stack is only 4-byte aligned, and -mpreferred-stack-boundary is broken (not supported). clang is supposed to do alignment as necessary. That is, almost never. It does the stack adjustment for doubles, but not for alloca() or even for long doubles: double d; test(&d); /* adjusted */ test(alloca(8)); /* broken */ long double d; test(&d); /* broken */ On i386, gcc depends on the default for doubles and long doubles (and more importantly, for alignment directives and SSE variables), so it never needs to adjust for alloca(), the same as on amd64, but always does it. The stack allocation for multiple alloca()s or declarations (even ones in inner blocks), should be coalesced and done at the start of a function. gcc but not clang pessimizes this too. For alloca(8); alloca(8); on both amd64 and i386, gcc generates 2 separate allocations of 32 (?) bytes each with null (?) adjustments for each. Of course, variable stack allocations cannot be coalesced before the variables are known. Handling the stack for this case requires more care. For example, the original stack pointer must be saved, since subtraction to restore it cannot be used. Similarly if the stack is adjusted using andl. Allocations may be intentionally delayed to avoid wasting stack space, but this doesn't work for alloca() since the allocations are required to live (as if) until the end of the function. It also tends not to work for fixed-size variable allocations, due to optimizations. It might work for VLAs in inner blocks, depending on whether the compiler optimizes for time over space by delaying the deallocation. Compilers now track variable lifetimes and could deallocate even ones in outer scope to optimize for space, but rarely do. > Because it's non-standard and the alloca(3) man page discourages it: > _____ > ... > BUGS > The alloca() function is machine and compiler dependent; its use is dis- > couraged. This became out of date with VLAs in C99. Except for scopes, compilers must have slightly more complications to support VLAs than alloca(). They might still not support alloca(). But FreeBSD never used ones that don't. That it would never use them was not so clear when this man page was written. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150215162553.L977>