From owner-freebsd-current@FreeBSD.ORG  Sun Jan  4 22:29:04 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 53C84106564A;
	Sun,  4 Jan 2009 22:29:04 +0000 (UTC)
	(envelope-from volker@vwsoft.com)
Received: from frontmail.ipactive.de (frontmail.maindns.de [85.214.95.103])
	by mx1.freebsd.org (Postfix) with ESMTP id D05708FC08;
	Sun,  4 Jan 2009 22:29:03 +0000 (UTC)
	(envelope-from volker@vwsoft.com)
Received: from mail.vtec.ipme.de (Q7c0c.q.ppp-pool.de [89.53.124.12])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by frontmail.ipactive.de (Postfix) with ESMTP id 6E9EF12883F;
	Sun,  4 Jan 2009 23:09:10 +0100 (CET)
Received: from [192.168.16.4] (dardanos.sz.vwsoft.com [192.168.16.4])
	by mail.vtec.ipme.de (Postfix) with ESMTP id 82BEA2E916;
	Sun,  4 Jan 2009 23:08:58 +0100 (CET)
Message-ID: <49613379.8040901@vwsoft.com>
Date: Sun, 04 Jan 2009 23:08:57 +0100
From: Volker <volker@vwsoft.com>
User-Agent: Thunderbird 2.0.0.18 (X11/20081203)
MIME-Version: 1.0
To: freebsd-current@freebsd.org
X-Enigmail-Version: 0.95.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
MailScanner-NULL-Check: 1231711747.81534@rR+KwgHoNNGIrpsmGCFX1A
X-MailScanner-ID: 82BEA2E916.912EC
X-VWSoft-MailScanner: Found to be clean
X-MailScanner-From: volker@vwsoft.com
X-ipactive-MailScanner-Information: Please contact the ISP for more information
X-ipactive-MailScanner: Found to be clean
X-ipactive-MailScanner-From: volker@vwsoft.com
Cc: attilio@freebsd.org, uwe@grohnwaldt.eu, kib@freebsd.org,
	Martin Wilke <miwi@FreeBSD.org>
Subject: panic in bundirty
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 04 Jan 2009 22:29:04 -0000

Gentlemen,

I've had the pleasure to inspect miwi's new tinderbox machine in a panic
situation.

The debugging info we're able to get out of the box is the following:

panic: bundirty: buffer 0xffffffff9a475438 still on queue 1

#9  0xffffffff8049f196 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:556
#10 0xffffffff8050b540 in bundirty (bp=Variable "bp" is not available.
) at /usr/src/sys/kern/vfs_bio.c:1068
#11 0xffffffff8050d684 in brelse (bp=0xffffffff9a475438)
    at /usr/src/sys/kern/vfs_bio.c:1388
#12 0xffffffff8050f07c in bufdone (bp=0xffffffff9a475438)
    at /usr/src/sys/kern/vfs_bio.c:3157
#13 0xffffffff8069cb6c in ffs_backgroundwritedone (bp=0xffffffff9a475438)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1679
#14 0xffffffff8050f051 in bufdone (bp=0xffffffff9a475438)
    at /usr/src/sys/kern/vfs_bio.c:3151
#15 0xffffffff80452b51 in g_io_schedule_up (tp=Variable "tp" is not
available.
)
    at /usr/src/sys/geom/geom_io.c:587
#16 0xffffffff8045323f in g_up_procbody () at
/usr/src/sys/geom/geom_kern.c:95
#17 0xffffffff8048037a in fork_exit (
    callout=0xffffffff804531d0 <g_up_procbody>, arg=0x0,
    frame=0xffffffff80e63c80) at /usr/src/sys/kern/kern_fork.c:789

in frame 11, 'p *bp' gives:
$1 = {b_bufobj = 0xffffff00034fe958, b_bcount = 16384, b_caller1 = 0x0,
  b_data = 0xffffffff9f4b7000 "", b_error = 5, b_iocmd = 2 '\002',
  b_ioflags = 2 '\002', b_iooffset = 7127891968, b_resid = 16384,
  b_iodone = 0, b_blkno = 13921664, b_offset = 7127891968, b_bobufs = {
    tqe_next = 0xffffffff9a505a38, tqe_prev = 0xffffff00034fe9a8},
  b_left = 0x0, b_right = 0xffffffff9a505a38, b_vflags = 0, b_freelist = {
    tqe_next = 0xffffffff9a472db8, tqe_prev = 0xffffffff80b56210},
  b_qindex = 1, b_flags = 41092, b_xflags = 33 '!', b_lock =
{lock_object = {
      lo_name = 0xffffffff8083ce18 "bufwait",
      lo_type = 0xffffffff8083ce18 "bufwait", lo_flags = 91947008,
      lo_witness_data = {lod_list = {stqe_next = 0xffffffff80ae6f80},
        lod_witness = 0xffffffff80ae6f80}}, lk_lock = 18446744073709551608,
    lk_recurse = 0, lk_timo = 0, lk_pri = 80}, b_bufsize = 16384,
  b_runningbufspace = 0, b_kvabase = 0xffffffff9f4b7000 "", b_kvasize =
16384,
  b_lblkno = 13921664, b_vp = 0xffffff00034fe820, b_dirtyoff = 0,
  b_dirtyend = 0, b_rcred = 0x0, b_wcred = 0x0,
  b_saveaddr = 0xffffffff9f4b7000, b_pager = {pg_reqpage = 0}, b_cluster = {
    cluster_head = {tqh_first = 0xffffffff9a479c68,
      tqh_last = 0xffffffff9a4901b0}, cluster_entry = {
      tqe_next = 0xffffffff9a479c68, tqe_prev = 0xffffffff9a4901b0}},
  b_pages = {0xffffff00de0eafa0, 0xffffff00de0eb010, 0xffffff00de0eb080,
    0xffffff00de0eb0f0, 0x0 <repeats 28 times>}, b_npages = 4, b_dep = {
    lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0,
  b_fsprivate3 = 0x0, b_pin_count = 0}

This panic does not occur before commit 176708.

The panic is in sys/kern/vfs_bio.c bundirty 1055:
KASSERT( bp->b_flags & B_REMFREE || bp->b_qindex == QUEUE_NONE ...

bp->b_qindex = 0x01 (QUEUE_FREE)
bp->b_flags = 0xa084 (B_ASYNC | B_DELWRI | B_NOCACHE | B_INVAL)

My assumption is:

brele knows about the QUEUE_FREE but bundirty only wants to operate on
QUEUE_NONE. Either this is a race condition, brele should not call
bundirty or bundirty should operate not just on QUEUE_NONE buffers.

As there have been some locking related commits in the changes in
question, this might also be caused by an unlocked operation.

right before the panic, we're seeing DMA errors:

ad6: setting up DMA failed
_vfs_done():ad6[WRITE(offset=5412487168, length=131072)]error = 5
ad6: FAILURE - load data
ad6: setting up DMA failed
g_vfs_done():ad6[WRITE(offset=5044109312, length=131072)]error = 5
g_vfs_done():ad6[WRITE(offset=5230985216, length=131072)]error = 5
g_vfs_done():ad6[WRITE(offset=5231116288, length=131072)]error = 5
g_vfs_done():ad6[WRITE(offset=5044240384, length=131072)]error = 5
g_vfs_done():ad6[WRITE(offset=5412618240, length=131072)]error = 5
g_vfs_done():ad4s1d[WRITE(offset=7127891968, length=16384)]error = 5

kgdb output can be found at:
http://www.bsdmeat.net/~lando/miwi/kgdb.txt

Suggestions what the correct fix to this issue is? Change bundirty or
brele? Clean up locking?

CC'ing those who committed to vfs_bio.c in the relevant time frame.

Thanks

Volker