From owner-freebsd-current@FreeBSD.ORG Tue Jan 20 10:43:26 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D4971065674 for ; Tue, 20 Jan 2009 10:43:26 +0000 (UTC) (envelope-from christoph.mallon@gmx.de) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by mx1.freebsd.org (Postfix) with SMTP id DC3458FC26 for ; Tue, 20 Jan 2009 10:43:25 +0000 (UTC) (envelope-from christoph.mallon@gmx.de) Received: (qmail invoked by alias); 20 Jan 2009 10:43:23 -0000 Received: from p54A3DE90.dip.t-dialin.net (EHLO tron.homeunix.org) [84.163.222.144] by mail.gmx.net (mp045) with SMTP; 20 Jan 2009 11:43:23 +0100 X-Authenticated: #1673122 X-Provags-ID: V01U2FsdGVkX1/BTVFSZv3B5RolQBgSEM0aL+4AZtssPsvOFcqFtO 3ybwq0bQYtLKdT Message-ID: <4975AACA.3050906@gmx.de> Date: Tue, 20 Jan 2009 11:43:22 +0100 From: Christoph Mallon User-Agent: Thunderbird 2.0.0.19 (X11/20090103) MIME-Version: 1.0 To: FreeBSD Current References: <496D94FD.9030300@gmx.de> In-Reply-To: <496D94FD.9030300@gmx.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.44 Cc: uwe@grohnwaldt.eu, FreeBSD Hackers , miwi@freebsd.org Subject: Re: Question about panic in brelse() X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jan 2009 10:43:27 -0000 Sadly I got no response about the panic problem so far. I investigated further and I came to the conclusion, that there are at least two problems/bugs in brelse(). Here are my findings. All lines refer to sys/kern/vfs_bio.c at r183754, which is the most recent version of this file. Below is the KASSERT(), which triggers in bundirty() (which is called by brelse()): KASSERT(bp->b_flags & B_REMFREE || bp->b_qindex == QUEUE_NONE, ("bundirty: buffer %p still on queue %d", bp, bp->b_qindex)); So this assertion triggers, if neither B_REMFREE is set nor the buffer is in QUEUE_NONE. Now lets have a look at brelse() before the call to bundirty() happens. In line 1325 we find: if (bp->b_flags & B_REMFREE) bremfreel(bp); In bremfreel() B_REMFREE is cleared (line 684). So after this B_REMFREE is guaranteed to be clear and so the first part of the triggered KASSERT() will be false. Now for the second part, i.e. QUEUE_NONE: At line 1331 starts an if-cascade. No matter which case of this cascade is entered, b_qindex is set and it especially in no case it is set to QUEUE_NONE. so after this if-cascade it is guaranteed that b_qindex is *not* QUEUE_NONE. So when we arrive at line 1373ff at the problematic call to bundirty(), which contains the KASSERT(), it is guaranteed that neither B_REMFREE is set nor the buffer is in QUEUE_NONE and therefore the KASSERT() will be triggered. So we have a serious problem here: When the call is done, it *will* fail. This is one part of the problem. I have no solution, I don't know what would be correct here. Further, I looked at r93707 when this call to bundirty() was added. The situation was basically the same back then, i.e. the buffer could not be in QUEUE_NONE (the check for B_REMFREE did not exist back then, so the KASSERT() was more strict). So this change seems to wrong all along or at least dubious. The commit log of r93707 on the other hand claims that it solved a serious problem. If this was tested back then, it must have been done without INVARIANTS. It seems to be, that it was a hot fix - mind the MFC time of 1 day. Now for the second problem. As stated in the original mail, there is a buffer for which a write attempt failed: > b_iocmd = BIO_WRITE > b_ioflags = BIO_ERROR > b_error = EIO > b_flags = B_NOCACHE Because the error ist "just" an EIO, the buffer is re-dirtied and BIO_ERROR is cleared (line 1144ff). So writing the buffer should be tried again. But later in line 1342ff it is determined that the buffer contains junk, because B_NOCACHE is set so B_INVAL is also set. This leads to entering the path in line 1373, which then triggers the KASSERT(). On the other hand, the buffer does not contain junk, because it should be retried to be written! It only should be not kept around after the write was successful (or is considered to be failed permamently). So I think testing B_NOCACHE here alone is to weak and the condition has to be modified. I propose this change: @@ -1340,7 +1340,8 @@ } TAILQ_INSERT_HEAD(&bufqueues[bp->b_qindex], bp, b_freelist); /* buffers with junk contents */ - } else if (bp->b_flags & (B_INVAL | B_NOCACHE | B_RELBUF) || + } else if (bp->b_flags & (B_INVAL | B_RELBUF) || + ((bp->b_flags & (B_NOCACHE | B_DELWRI)) == B_NOCACHE) (bp->b_ioflags & BIO_ERROR)) { bp->b_flags |= B_INVAL; bp->b_xflags &= ~(BX_BKGRDWRITE | BX_ALTDATA); This only enters the path to invalidate the buffer in case of B_NOCACHE being set, if B_DELWRI is *not* set. Can somebody reconstruct my analysis and confirm/reject it? Does my proposed solution for the second problem look sensible? Are the two problems related, i.e. can the first problem only occur because the second problem exists? What about the change in r93707 in general? Is it wrong? Does my proposal remove the cause for the change done in r93707 or are there other circumstances by which the situation can be triggered? Hopefully somebody can shed some more light on this. Christoph