From owner-freebsd-arch Thu Dec 20 11: 0:15 2001 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc52.attbi.com (rwcrmhc52.attbi.com [216.148.227.88]) by hub.freebsd.org (Postfix) with ESMTP id 3109337B41E; Thu, 20 Dec 2001 11:00:12 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc52.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20011220190011.DGIW6450.rwcrmhc52.attbi.com@InterJet.elischer.org>; Thu, 20 Dec 2001 19:00:11 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id KAA52167; Thu, 20 Dec 2001 10:51:53 -0800 (PST) Date: Thu, 20 Dec 2001 10:51:52 -0800 (PST) From: Julian Elischer To: Poul-Henning Kamp Cc: arch@freebsd.org Subject: Re: Kernel stack size and stacking: do we have a problem ? In-Reply-To: <600.1008837822@critter.freebsd.dk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Content-Transfer-Encoding: QUOTED-PRINTABLE Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Netgraph has a bounding scheme that archie and I came up with, but it has not been committed yet. basically, in the -current version, the mbufs are passed with an itteration counter, and=20 if you directly execute another module you increment it. If you queue the item you clear it to 0. After it reaches a limit of N the subsystem will=20 queue it rather than try run the next layer directly. I have code to do that here and I've been thinking about checking it in.. On Thu, 20 Dec 2001, Poul-Henning Kamp wrote: >=20 > As most of you have probably heard, I'm working on a stacking > disk I/O layer (http://freefall.freebsd.org/~phk/Geom). >=20 > This is as far as I know, only the third freely stackable subsystem > in the kernel, the first two being VFS/filesystems and netgraph. >=20 > The problem with stacking layered systems is that the na=EFve and > simple implementation, just calling into the layer below, has > basically unbounded kernel stack usage. >=20 > Fortunately for us, neither VFS nor netgraph has had too much use > yet, so we have not been excessively bothered by people running > out of kernel-stack. >=20 > It is well documented how to avoid the unbounded stack usage for > such setups: simply queue the requests at each "gadget" and run > a scheduler but this no where near as simple nor as fast as the > direct call. >=20 > So I guess we need to ask our selves the following questions: >=20 > 1. What do we do when people start to run out of kernel stack > because they stack filesystems ? > =09a) Tell them not to. > =09b) Tell them to increase UPAGES. > =09c) Increase default UPAGES. > =09d) Redesign VFS/VOP to avoid the problem. >=20 > 2. Do we in general want to incur the overhead of scheduling > in stacking layers or does increasing the kernel stack as > needed make more sense ? >=20 > 3. Would it be possible to make kernel stack size a sysctl ? >=20 > 4. Would it make sense to build an intelligent kernel-stack > overflow handling into the kernel, rather than "handling" > this with a panic ? >=20 > It should be trivially simple to make a function called > enough_stack() which would return false if we were in the > danger zone. This function could then be used to fail > intelligently at strategic high-risk points in the kernel: >=20 > =09int > =09somefunction(...) > =09{ > =09=09... >=20 > =09=09if (!enough_stack()) > =09=09=09return (ENOMEM); > =09=09... > =09} >=20 > Think about it... >=20 > --=20 > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetenc= e. >=20 > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message >=20 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message