From owner-freebsd-current@FreeBSD.ORG Thu Aug 21 04:18:44 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0C1DE16A4BF for ; Thu, 21 Aug 2003 04:18:44 -0700 (PDT) Received: from smtp01.syd.iprimus.net.au (smtp01.syd.iprimus.net.au [210.50.30.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7055543FE0 for ; Thu, 21 Aug 2003 04:18:43 -0700 (PDT) (envelope-from tim@robbins.dropbear.id.au) Received: from mail.robbins.dropbear.id.au (210.50.253.46) by smtp01.syd.iprimus.net.au (7.0.018) id 3F3287DF00322AED; Thu, 21 Aug 2003 21:18:41 +1000 Received: by mail.robbins.dropbear.id.au (Postfix, from userid 1000) id 3F30BC69A; Thu, 21 Aug 2003 21:18:29 +1000 (EST) Date: Thu, 21 Aug 2003 21:18:29 +1000 From: Tim Robbins To: Christian Brueffer Message-ID: <20030821111828.GA55273@dilbert.robbins.dropbear.id.au> References: <20030820222607.GA638@unixpages.org> <20030821034054.GA54061@dilbert.robbins.dropbear.id.au> <20030821061445.GK638@unixpages.org> <20030821075728.GA54713@dilbert.robbins.dropbear.id.au> <20030821082608.GM638@unixpages.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030821082608.GM638@unixpages.org> User-Agent: Mutt/1.4.1i cc: current@FreeBSD.ORG Subject: Re: panic: bundirty: buffer 0xc776e118 still on queue 2 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Aug 2003 11:18:44 -0000 On Thu, Aug 21, 2003 at 10:26:08AM +0200, Christian Brueffer wrote: > On Thu, Aug 21, 2003 at 05:57:28PM +1000, Tim Robbins wrote: > > Did one of the servers go down shortly before the panic, then? The last few > > lines of dmesg might be useful. > > > > No indication for that in the logs. I would have noticed anyway, as I was > playing music from one of the shares. > One of the shared file systems was full (besides the reserved space) at the > time of the panic. Could that have to do something with it? Perhaps. On closer examination, the backtrace doesn't match exactly the problem I'd seen and worked around in NTFS, but it seems to be related to it. Bad things happen when the cache can't write a dirty buffer back to disk -- in the NTFS case, this was triggered by another buffer cache bug that caused it to try to write a non-dirty buffer back to a read-only disk[*]. I would have thought that disk space on the server would have already been allocated for the buffer, so I'm not sure whether the filesystem being full could have caused a write error. But in any case, I'm not sure how to fix the bug you encountered, even if my speculations turn out to be correct. At one stage I thought I had found a logic error in brelse()'s handling of write errors, but I don't remember the specifics anymore. Tim [*] vfs_bio.c:getblk():gbincore(...) != NULL && bp->b_bcount != size && !(bp->b_flags & B_VMIO) && !(bp->b_flags & B_DELWRI) -> write non-dirty buffer to disk.