From owner-freebsd-hackers Tue Feb 23 6:45:40 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from lor.watermarkgroup.com (lor.watermarkgroup.com [207.202.73.33]) by hub.freebsd.org (Postfix) with ESMTP id 39E0A1115B for ; Tue, 23 Feb 1999 06:45:37 -0800 (PST) (envelope-from luoqi@watermarkgroup.com) Received: (from luoqi@localhost) by lor.watermarkgroup.com (8.8.8/8.8.8) id JAA02311; Tue, 23 Feb 1999 09:44:38 -0500 (EST) (envelope-from luoqi) Date: Tue, 23 Feb 1999 09:44:38 -0500 (EST) From: Luoqi Chen Message-Id: <199902231444.JAA02311@lor.watermarkgroup.com> To: dfr@nlsystems.com, dillon@apollo.backplane.com Subject: Re: Panic in FFS/4.0 as of yesterday - update Cc: freebsd-hackers@FreeBSD.ORG, mjacob@feral.com Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > :I would suggest disabling it entirely to see if the system survives any > :better. If that helps, perhaps it should be using a field in struct proc > :to record the recursion depth. > : > :-- > :Doug Rabson Mail: dfr@nlsystems.com > :Nonlinear Systems Ltd. Phone: +44 181 442 9037 > > No, don't disable it. Unless you want the process to overflow it's > supervisor stack, that is! > It won't overflow kernel stack in this case, which was reentrancy rather than recursion. I don't see any real danger of recursion unless there's a broken layered FS implementation, which the comment says it tries to protect against, in which case we really should fix the fs instead. > The code is obviously broken, but disabling it will break it even worse. > > The write recursion test could actually be used as a count of the number > of I/O's which are 'starting up' ( verses in progress ). It's an obvious > failure as a stack recursion counter but judging from the comments, it > was designed to handle both conditions. > I don't think the code was designed to protect from too many 'starting up' I/O's (it would not panic if this is the case), but true run-away situations. > What appears to be happening is that both the buffer pool and the KVA > space for the buffer pool is being exhausted. The code appears to be > designed to deal only with the exhaustion of the KVA space. It assumes > that the buffer pool still has bp's available. That is why there was > a panic. > There's a test for numfreebuffers < lofreebuffers in getblk(), so there should still be bufs available and must be on the EMPTY queue, but couldn't be used because of the exhaustion of KVA space. > I think the proper solution is to have getnewbuf() speed up the syncer > daemon to retire the dirty buffers in the case where getnewbuf() > gets itself tied into knots, then wait and return NULL. Also, I think This sounds good. There's a variable just for that: rushjob :) > we need to implement a hard wait if numfreebuffers < lofreebuffers The test is in getblk(), but I agree it belongs to getnewbuf(). > and the caller to getnewbuf() is not the syncer daemon ( update_proc ), I'm not sure if this exemption is useful -- there's not much we can do if we run out of KVA space. > but allow it otherwise. writerecursion would then simply block waiting > for the syncer when it gets too big rather then panic. > Then the name "writerecursion" would be a little misleading, now it becomes a variable to limit too many async I/O's being started at one time. > It actually doesn't look too complex. I'll mess with Matt's test code. > > -Matt > Matthew Dillon > > -lq To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message