From owner-freebsd-hackers  Tue Feb 23 10:48: 7 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
	by hub.freebsd.org (Postfix) with ESMTP id D7B92114CE
	for <freebsd-hackers@FreeBSD.ORG>; Tue, 23 Feb 1999 10:48:05 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id KAA51270;
	Tue, 23 Feb 1999 10:48:01 -0800 (PST)
	(envelope-from dillon)
Date: Tue, 23 Feb 1999 10:48:01 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199902231848.KAA51270@apollo.backplane.com>
To: Luoqi Chen <luoqi@watermarkgroup.com>
Cc: dfr@nlsystems.com, dillon@apollo.backplane.com,
	freebsd-hackers@FreeBSD.ORG, mjacob@feral.com
Subject: Re: Panic in FFS/4.0 as of yesterday - update
References:  <199902231444.JAA02311@lor.watermarkgroup.com>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:>     No, don't disable it.  Unless you want the process to overflow it's
:>     supervisor stack, that is!
:> 
:It won't overflow kernel stack in this case, which was reentrancy rather
:than recursion. I don't see any real danger of recursion unless there's
:a broken layered FS implementation, which the comment says it tries to
:protect against, in which case we really should fix the fs instead.

    getnewbuf() is starting vfs_bio_awrite()'s on essentially random
    buffers - not necessarily just buffers related to the VFS recursion.
    This means that it is possible for it to recurse through unrelated
    bp's and overflow the stack.

:>     failure as a stack recursion counter but judging from the comments, it
:>     was designed to handle both conditions.
:> 
:I don't think the code was designed to protect from too many 'starting up'
:I/O's (it would not panic if this is the case), but true run-away situations.
:
:>     I think the proper solution is to have getnewbuf() speed up the syncer
:>     daemon to retire the dirty buffers in the case where getnewbuf() 
:>     gets itself tied into knots, then wait and return NULL.  Also, I think
:
:This sounds good. There's a variable just for that: rushjob :)

:>     we need to implement a hard wait if numfreebuffers < lofreebuffers 
:
:The test is in getblk(), but I agree it belongs to getnewbuf().
:
:>     and the caller to getnewbuf() is not the syncer daemon ( update_proc ),
:
:I'm not sure if this exemption is useful -- there's not much we can do if
:we run out of KVA space.
:
:>     but allow it otherwise.  writerecursion would then simply block waiting
:>     for the syncer when it gets too big rather then panic.
:> 
:Then the name "writerecursion" would be a little misleading, now it becomes
:a variable to limit too many async I/O's being started at one time.
:
:-lq

    getnewbuf() appears to have the same problem that the ufs fsync code
    has -- it's assuming that when it converts a DELWRI bp to async, that
    the I/O operation will either be in-progress or completely resolved
    after the call.  But there are cases, such as with softupdates, where
    this isn't true.. where the bp may be requeued synchronously due to
    their being unresolved dependancies.  In this case, both getnewbuf()
    and the ufs fsync code will potentially loop on the same bp over 
    and over again.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message