Date: Mon, 25 Jan 2016 17:43:43 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Andriy Voskoboinyk <avos@freebsd.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r294697 - head/sys/net80211 Message-ID: <20160125170231.H986@besplex.bde.org> In-Reply-To: <201601242335.u0ONZKwW053626@repo.freebsd.org> References: <201601242335.u0ONZKwW053626@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 24 Jan 2016, Andriy Voskoboinyk wrote: > Log: > net80211: reduce stack usage for ieee80211_ioctl*() methods. > > Use malloc(9) for > - struct ieee80211req_wpaie2 (518 bytes, used in > ieee80211_ioctl_getwpaie()) > - struct ieee80211_scan_req (128 bytes, used in setmlme_assoc_adhoc() > and ieee80211_ioctl_scanreq()) > > Also, drop __noinline workarounds; stack overflow is not reproducible > with recent compilers. > > Tested with Clang 3.7.1, GCC 4.2.1 (from 9.3-RELEASE) and 4.9.4 > (with -fstack-usage flag) Inlining also breaks debugging. It is best avoided generally using gcc -fnon-inline-functions-called-once. This flag is broken (not supported) in clang. > Modified: head/sys/net80211/ieee80211_ioctl.c > ============================================================================== > --- head/sys/net80211/ieee80211_ioctl.c Sun Jan 24 23:28:14 2016 (r294696) > +++ head/sys/net80211/ieee80211_ioctl.c Sun Jan 24 23:35:20 2016 (r294697) > -/* > - * When building the kernel with -O2 on the i386 architecture, gcc > - * seems to want to inline this function into ieee80211_ioctl() > - * (which is the only routine that calls it). When this happens, > - * ieee80211_ioctl() ends up consuming an additional 2K of stack > - * space. (Exactly why it needs so much is unclear.) The problem > - * is that it's possible for ieee80211_ioctl() to invoke other > - * routines (including driver init functions) which could then find > - * themselves perilously close to exhausting the stack. > - * > - * To avoid this, we deliberately prevent gcc from inlining this > - * routine. Another way to avoid this is to use less agressive > - * optimization when compiling this file (i.e. -O instead of -O2) > - * but special-casing the compilation of this one module in the > - * build system would be awkward. > - */ Even with -O1 -mtune=i386 -fno-inline-functions-called-once, gcc-4.2.1 still breaks debugging of static functions by using a different calling convention for them. The first couple of args are passed in registers. This breaks ddb stack traces on i386 not quite as badly as they have always been broken on amd64. (ddb cannot determine the number of args or where they are on amd64, and used to print 5 words of stack garbage. On i386, the args list is still printed and is almost as confusing as garbage, since it is correct for extern functions but for static functions it starts at about the third arg). I use __attribute__((__regparm(0))) to unbreak the ABI for a few functions designed to be called from within ddb as well as the main code. Some older functions like inb_() with this desgn still work accidentally because they are extern. I haven't figured out the command-line flag to fix this yet. Maybe just -mregparm. I didn't try hard to fix this since I was working on optimizations more than debugging when I added the attribute. Inlining really should reduce stack usage and thus be an optimization that is actually useful for kernels. Compilers are clueless about optimizations that are useful for kernels. -Os should help, but is very broken in gcc-4.2.1 (it fails to compile some files due to hitting inlining limits, and after working around this, gives a negative optimization for space of about 30%). -Os works OK for clang -- it reduces the space a little and the time by almost as much as -O2. But optimizations like clang -O2 -march=native are less than 10% faster than pessimizations like gcc-old -O1 -mtune=i386 -fno-inline-functions-called-once -fno-unit-at-a-time in kernels, in micro-benchmarks that are favourable to the optimizations. More like 1% for normal use. (-fno-unit-at-a-time should reduce opportunities for inlining static functions if -fno-inline-functions-called-once doesn't work, but is also broken (not supported) in clang.) Optimizations larger than 1% can possibly be obtained by using compiler builtins, but compiler builtins are turned off by -ffreestanding. I no longer bother to turn some like __builtin_memcpy() back on. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160125170231.H986>