From owner-freebsd-hackers Tue Feb 23 1:21:19 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 92A2911257 for ; Tue, 23 Feb 1999 01:20:49 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id BAA44464; Tue, 23 Feb 1999 01:20:45 -0800 (PST) (envelope-from dillon) Date: Tue, 23 Feb 1999 01:20:45 -0800 (PST) From: Matthew Dillon Message-Id: <199902230920.BAA44464@apollo.backplane.com> To: Doug Rabson Cc: Matthew Jacob , freebsd-hackers@FreeBSD.ORG Subject: Re: Panic in FFS/4.0 as of yesterday - update References: Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :I would suggest disabling it entirely to see if the system survives any :better. If that helps, perhaps it should be using a field in struct proc :to record the recursion depth. : :-- :Doug Rabson Mail: dfr@nlsystems.com :Nonlinear Systems Ltd. Phone: +44 181 442 9037 No, don't disable it. Unless you want the process to overflow it's supervisor stack, that is! The code is obviously broken, but disabling it will break it even worse. The write recursion test could actually be used as a count of the number of I/O's which are 'starting up' ( verses in progress ). It's an obvious failure as a stack recursion counter but judging from the comments, it was designed to handle both conditions. What appears to be happening is that both the buffer pool and the KVA space for the buffer pool is being exhausted. The code appears to be designed to deal only with the exhaustion of the KVA space. It assumes that the buffer pool still has bp's available. That is why there was a panic. I think the proper solution is to have getnewbuf() speed up the syncer daemon to retire the dirty buffers in the case where getnewbuf() gets itself tied into knots, then wait and return NULL. Also, I think we need to implement a hard wait if numfreebuffers < lofreebuffers and the caller to getnewbuf() is not the syncer daemon ( update_proc ), but allow it otherwise. writerecursion would then simply block waiting for the syncer when it gets too big rather then panic. It actually doesn't look too complex. I'll mess with Matt's test code. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message