Date: Thu, 20 Dec 2001 11:04:59 -0800 (PST) From: Julian Elischer <julian@elischer.org> To: Poul-Henning Kamp <phk@freebsd.org> Cc: arch@freebsd.org Subject: Re: Kernel stack size and stacking: do we have a problem ? Message-ID: <Pine.BSF.4.21.0112201053020.46573-100000@InterJet.elischer.org> In-Reply-To: <600.1008837822@critter.freebsd.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 20 Dec 2001, Poul-Henning Kamp wrote: >=20 > As most of you have probably heard, I'm working on a stacking > disk I/O layer (http://freefall.freebsd.org/~phk/Geom). >=20 > This is as far as I know, only the third freely stackable subsystem > in the kernel, the first two being VFS/filesystems and netgraph. >=20 > The problem with stacking layered systems is that the na=EFve and > simple implementation, just calling into the layer below, has > basically unbounded kernel stack usage. >=20 > Fortunately for us, neither VFS nor netgraph has had too much use > yet, so we have not been excessively bothered by people running > out of kernel-stack. >=20 > It is well documented how to avoid the unbounded stack usage for > such setups: simply queue the requests at each "gadget" and run > a scheduler but this no where near as simple nor as fast as the > direct call. >=20 > So I guess we need to ask our selves the following questions: >=20 > 1. What do we do when people start to run out of kernel stack > because they stack filesystems ? > =09a) Tell them not to. > =09b) Tell them to increase UPAGES. > =09c) Increase default UPAGES. > =09d) Redesign VFS/VOP to avoid the problem. A couple of points.. Firstly, the stacks were just increased.. with an unmapped guard page at the end (well it's an option). DOesn't solve the problem,.. just related info. Secondly UPAGES will make no difference as it no longer exists.. use KSTACK_PAGES instead. Also we should implement the=20 'stack-hogs' patch for gcc that there are 3 versions around for. Some fs layers are just massive HOGS of space for very little reason. >=20 > 2. Do we in general want to incur the overhead of scheduling > in stacking layers or does increasing the kernel stack as > needed make more sense ? Try an adaptive scheme such as I mentionned above.. 99.99% of the time it avoids scheduling. >=20 > 3. Would it be possible to make kernel stack size a sysctl ? hmmm, it might but it would be tricky because the constant KSTACK_PAGES is used for both allocation and deallocation so if you just changed it to be a variable, and changed it in between...... This is about to change BTW in KSE as there is a kstack per thread and the allocation routines will be different. The problem is that I'm caching threads and their stacks for quick reallocation so I'd have to check each stack as I pass it out and check whether I need to resize it to match the new size.. >=20 > 4. Would it make sense to build an intelligent kernel-stack > overflow handling into the kernel, rather than "handling" > this with a panic ? >=20 Presently we have a guard page (unmapped) We could possibly allocate more pages and fill them in if a page fault occurs. It would be quite a change to the current code but it COULD be done. (but not by me.. you'd have to have a good handle on the in-kernel fault handling, which was hair-raising last time I looked) > It should be trivially simple to make a function called > enough_stack() which would return false if we were in the > danger zone. This function could then be used to fail > intelligently at strategic high-risk points in the kernel: >=20 > =09int > =09somefunction(...) > =09{ > =09=09... >=20 > =09=09if (!enough_stack()) > =09=09=09return (ENOMEM); > =09=09... > =09} >=20 > Think about it... We COULD have a page available to map into the guard page that woudl allow completion but the activation of it would cause such a low-stack state to be entered. >=20 > --=20 > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetenc= e. >=20 > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message >=20 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0112201053020.46573-100000>