From owner-freebsd-hackers  Tue Feb 23  6:45:40 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from lor.watermarkgroup.com (lor.watermarkgroup.com [207.202.73.33])
	by hub.freebsd.org (Postfix) with ESMTP id 39E0A1115B
	for <freebsd-hackers@FreeBSD.ORG>; Tue, 23 Feb 1999 06:45:37 -0800 (PST)
	(envelope-from luoqi@watermarkgroup.com)
Received: (from luoqi@localhost)
	by lor.watermarkgroup.com (8.8.8/8.8.8) id JAA02311;
	Tue, 23 Feb 1999 09:44:38 -0500 (EST)
	(envelope-from luoqi)
Date: Tue, 23 Feb 1999 09:44:38 -0500 (EST)
From: Luoqi Chen <luoqi@watermarkgroup.com>
Message-Id: <199902231444.JAA02311@lor.watermarkgroup.com>
To: dfr@nlsystems.com, dillon@apollo.backplane.com
Subject: Re: Panic in FFS/4.0 as of yesterday - update
Cc: freebsd-hackers@FreeBSD.ORG, mjacob@feral.com
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> :I would suggest disabling it entirely to see if the system survives any
> :better. If that helps, perhaps it should be using a field in struct proc
> :to record the recursion depth.
> :
> :--
> :Doug Rabson				Mail:  dfr@nlsystems.com
> :Nonlinear Systems Ltd.			Phone: +44 181 442 9037
> 
>     No, don't disable it.  Unless you want the process to overflow it's
>     supervisor stack, that is!
> 
It won't overflow kernel stack in this case, which was reentrancy rather
than recursion. I don't see any real danger of recursion unless there's
a broken layered FS implementation, which the comment says it tries to
protect against, in which case we really should fix the fs instead.

>     The code is obviously broken, but disabling it will break it even worse.
> 
>     The write recursion test could actually be used as a count of the number
>     of I/O's which are 'starting up' ( verses in progress ).  It's an obvious
>     failure as a stack recursion counter but judging from the comments, it
>     was designed to handle both conditions.
> 
I don't think the code was designed to protect from too many 'starting up'
I/O's (it would not panic if this is the case), but true run-away situations.

>     What appears to be happening is that both the buffer pool and the KVA
>     space for the buffer pool is being exhausted.  The code appears to be
>     designed to deal only with the exhaustion of the KVA space.  It assumes
>     that the buffer pool still has bp's available.  That is why there was
>     a panic.
> 
There's a test for numfreebuffers < lofreebuffers in getblk(), so there
should still be bufs available and must be on the EMPTY queue, but couldn't
be used because of the exhaustion of KVA space.

>     I think the proper solution is to have getnewbuf() speed up the syncer
>     daemon to retire the dirty buffers in the case where getnewbuf() 
>     gets itself tied into knots, then wait and return NULL.  Also, I think

This sounds good. There's a variable just for that: rushjob :)

>     we need to implement a hard wait if numfreebuffers < lofreebuffers 

The test is in getblk(), but I agree it belongs to getnewbuf().

>     and the caller to getnewbuf() is not the syncer daemon ( update_proc ),

I'm not sure if this exemption is useful -- there's not much we can do if
we run out of KVA space.

>     but allow it otherwise.  writerecursion would then simply block waiting
>     for the syncer when it gets too big rather then panic.
> 
Then the name "writerecursion" would be a little misleading, now it becomes
a variable to limit too many async I/O's being started at one time.

>     It actually doesn't look too complex.  I'll mess with Matt's test code.
> 
> 					-Matt
> 					Matthew Dillon 
> 					<dillon@backplane.com>
> 

-lq


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message