From owner-freebsd-hackers@FreeBSD.ORG  Tue Jan 20 10:43:26 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 685371065672
	for <freebsd-hackers@freebsd.org>; Tue, 20 Jan 2009 10:43:26 +0000 (UTC)
	(envelope-from christoph.mallon@gmx.de)
Received: from mail.gmx.net (mail.gmx.net [213.165.64.20])
	by mx1.freebsd.org (Postfix) with SMTP id D7A6E8FC21
	for <freebsd-hackers@freebsd.org>; Tue, 20 Jan 2009 10:43:25 +0000 (UTC)
	(envelope-from christoph.mallon@gmx.de)
Received: (qmail invoked by alias); 20 Jan 2009 10:43:23 -0000
Received: from p54A3DE90.dip.t-dialin.net (EHLO tron.homeunix.org)
	[84.163.222.144]
	by mail.gmx.net (mp045) with SMTP; 20 Jan 2009 11:43:23 +0100
X-Authenticated: #1673122
X-Provags-ID: V01U2FsdGVkX1/BTVFSZv3B5RolQBgSEM0aL+4AZtssPsvOFcqFtO
	3ybwq0bQYtLKdT
Message-ID: <4975AACA.3050906@gmx.de>
Date: Tue, 20 Jan 2009 11:43:22 +0100
From: Christoph Mallon <christoph.mallon@gmx.de>
User-Agent: Thunderbird 2.0.0.19 (X11/20090103)
MIME-Version: 1.0
To: FreeBSD Current <freebsd-current@freebsd.org>
References: <496D94FD.9030300@gmx.de>
In-Reply-To: <496D94FD.9030300@gmx.de>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
X-FuHaFi: 0.44
Cc: uwe@grohnwaldt.eu, FreeBSD Hackers <freebsd-hackers@freebsd.org>,
	miwi@freebsd.org
Subject: Re: Question about panic in brelse()
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jan 2009 10:43:27 -0000

Sadly I got no response about the panic problem so far. I investigated 
further and I came to the conclusion, that there are at least two 
problems/bugs in brelse(). Here are my findings. All lines refer to 
sys/kern/vfs_bio.c at r183754, which is the most recent version of this 
file.

Below is the KASSERT(), which triggers in bundirty() (which is called by 
brelse()):
KASSERT(bp->b_flags & B_REMFREE || bp->b_qindex == QUEUE_NONE,
     ("bundirty: buffer %p still on queue %d", bp, bp->b_qindex));

So this assertion triggers, if neither B_REMFREE is set nor the buffer 
is in QUEUE_NONE.

Now lets have a look at brelse() before the call to bundirty() happens. 
In line 1325 we find:
if (bp->b_flags & B_REMFREE)
     bremfreel(bp);

In bremfreel() B_REMFREE is cleared (line 684). So after this B_REMFREE 
is guaranteed to be clear and so the first part of the triggered 
KASSERT() will be false.

Now for the second part, i.e. QUEUE_NONE: At line 1331 starts an 
if-cascade. No matter which case of this cascade is entered, b_qindex is 
set and it especially in no case it is set to QUEUE_NONE. so after this 
if-cascade it is guaranteed that b_qindex is *not* QUEUE_NONE.

So when we arrive at line 1373ff at the problematic call to bundirty(), 
which contains the KASSERT(), it is guaranteed that neither B_REMFREE is 
set nor the buffer is in QUEUE_NONE and therefore the KASSERT() will be 
triggered. So we have a serious problem here: When the call is done, it 
*will* fail.

This is one part of the problem. I have no solution, I don't know what 
would be correct here. Further, I looked at r93707 when this call to 
bundirty() was added. The situation was basically the same back then, 
i.e. the buffer could not be in QUEUE_NONE (the check for B_REMFREE did 
not exist back then, so the KASSERT() was more strict). So this change 
seems to wrong all along or at least dubious. The commit log of r93707 
on the other hand claims that it solved a serious problem. If this was 
tested back then, it must have been done without INVARIANTS. It seems to 
be, that it was a hot fix - mind the MFC time of 1 day.


Now for the second problem. As stated in the original mail, there is a 
buffer for which a write attempt failed:
>   b_iocmd   = BIO_WRITE
>   b_ioflags = BIO_ERROR
>   b_error   = EIO
>   b_flags   = B_NOCACHE 

Because the error ist "just" an EIO, the buffer is re-dirtied and 
BIO_ERROR is cleared (line 1144ff). So writing the buffer should be 
tried again.
But later in line 1342ff it is determined that the buffer contains junk, 
because B_NOCACHE is set so B_INVAL is also set. This leads to entering 
the path in line 1373, which then triggers the KASSERT().
On the other hand, the buffer does not contain junk, because it should 
be retried to be written! It only should be not kept around after the 
write was successful (or is considered to be failed permamently). So I 
think testing B_NOCACHE here alone is to weak and the condition has to 
be modified. I propose this change:

@@ -1340,7 +1340,8 @@
  		}
  		TAILQ_INSERT_HEAD(&bufqueues[bp->b_qindex], bp, b_freelist);
  	/* buffers with junk contents */
-	} else if (bp->b_flags & (B_INVAL | B_NOCACHE | B_RELBUF) ||
+	} else if (bp->b_flags & (B_INVAL | B_RELBUF) ||
+	    ((bp->b_flags & (B_NOCACHE | B_DELWRI)) == B_NOCACHE)
  	    (bp->b_ioflags & BIO_ERROR)) {
  		bp->b_flags |= B_INVAL;
  		bp->b_xflags &= ~(BX_BKGRDWRITE | BX_ALTDATA);

This only enters the path to invalidate the buffer in case of B_NOCACHE 
being set, if B_DELWRI is *not* set.


Can somebody reconstruct my analysis and confirm/reject it?
Does my proposed solution for the second problem look sensible?
Are the two problems related, i.e. can the first problem only occur 
because the second problem exists?
What about the change in r93707 in general? Is it wrong? Does my 
proposal remove the cause for the change done in r93707 or are there 
other circumstances by which the situation can be triggered?


Hopefully somebody can shed some more light on this.

	Christoph