From owner-freebsd-questions@FreeBSD.ORG Wed Aug 3 02:44:08 2005 Return-Path: X-Original-To: questions@freebsd.org Delivered-To: freebsd-questions@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1000516A427 for ; Wed, 3 Aug 2005 02:44:08 +0000 (GMT) (envelope-from dpk@dpk.net) Received: from shared10.hosting.flyingcroc.net (shared10.hosting.flyingcroc.net [207.246.149.144]) by mx1.FreeBSD.org (Postfix) with ESMTP id C10C643D55 for ; Wed, 3 Aug 2005 02:44:07 +0000 (GMT) (envelope-from dpk@dpk.net) Received: from shared10.hosting.flyingcroc.net (localhost [127.0.0.1]) by shared10.hosting.flyingcroc.net (8.12.9p2/8.12.10) with ESMTP id j732i7rR053004 for ; Tue, 2 Aug 2005 19:44:07 -0700 (PDT) Received: from localhost (dpk@localhost) by shared10.hosting.flyingcroc.net (8.12.9p2/8.12.10/Submit) with ESMTP id j732i74H053001 for ; Tue, 2 Aug 2005 19:44:07 -0700 (PDT) X-Authentication-Warning: shared10.hosting.flyingcroc.net: dpk owned process doing -bs Date: Tue, 2 Aug 2005 19:44:07 -0700 (PDT) From: dpk X-X-Sender: dpk@shared10.hosting.flyingcroc.net To: questions@freebsd.org Message-ID: <20050802190217.N64406@shared10.hosting.flyingcroc.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: Subject: FreeBSD 5.4-RELEASE-p5 panic X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Aug 2005 02:44:08 -0000 After much struggling (documented elsewhere) I have a backtrace showing one of a handful of panics I am getting on a FreeBSD 5.4-RELEASE-p5 system. The server has 4GB RAM, and is running with PAE and SMP enabled. If this is not the appropriate list for this, I can send it elsewhere, please let me know. (gdb) bt #0 kdb_enter (msg=0x12
) at ../../../kern/subr_kdb.c:266 #1 0xc033ea1f in panic (fmt=0xc04d782d "ffs_write: dir write") at ../../../kern/kern_shutdown.c:550 #2 0xc04292de in ffs_write (ap=0xeb858a94) at ../../../ufs/ffs/ffs_vnops.c:614 #3 0xc0452e71 in vnode_pager_generic_putpages (vp=0xc6237630, m=0xeb858bf0, bytecount=4096, flags=0, rtvals=0xeb858b70) at vnode_if.h:432 #4 0xc038b7e2 in vop_stdputpages (ap=0x12) at ../../../kern/vfs_default.c:650 #5 0xc038af3b in vop_defaultop (ap=0x0) at ../../../kern/vfs_default.c:157 #6 0xc0435ebf in ufs_vnoperate (ap=0x0) at ../../../ufs/ufs/ufs_vnops.c:2821 #7 0xc0452c0e in vnode_pager_putpages (object=0xc6901a50, m=0x12, count=18, sync=0, rtvals=0x12) at vnode_if.h:1357 #8 0xc044a5db in vm_pageout_flush (mc=0xeb858bf0, count=1, flags=0) at vm_pager.h:147 #9 0xc044a505 in vm_pageout_clean (m=0x0) at ../../../vm/vm_pageout.c:347 #10 0xc044b386 in vm_pageout_scan (pass=1) at ../../../vm/vm_pageout.c:985 #11 0xc044c106 in vm_pageout () at ../../../vm/vm_pageout.c:1476 #12 0xc032911d in fork_exit (callout=0xc044bdf4 , arg=0x0, frame=0xeb858d48) at ../../../kern/kern_fork.c:791 #13 0xc0474f6c in fork_trampoline () at ../../../i386/i386/exception.s:209 (Another panic I would get would follow roughly the same path except it would die while trying to unlock a vnode lock that the thread didn't own. I'll try to get this information some time, too.) This might all trace back to vm_pageout_clean() being called with as NULL argument. Looking at vm_pageout_clean, it looks as though that should never happen -- at least, there's nothing there that checks if it is NULL before it goes on to treat it as a pointer to a struct: static int vm_pageout_clean(m) vm_page_t m; { vm_object_t object; vm_page_t mc[2*vm_pageout_page_count]; int pageout_count; int ib, is, page_base; vm_pindex_t pindex = m->pindex; mtx_assert(&vm_page_queue_mtx, MA_OWNED); VM_OBJECT_LOCK_ASSERT(m->object, MA_OWNED); In frame #10, vm_pageout_scan: #10 0xc044b386 in vm_pageout_scan (pass=1) at ../../../vm/vm_pageout.c:985 985 if (vm_pageout_clean(m) != 0) { (gdb) p m $65 = 0xc0da66f8 (gdb) p *m $78 = {pageq = {tqe_next = 0xeb858cb0, tqe_prev = 0xc231c840}, listq = {tqe_next = 0x0, tqe_prev = 0xc6901a88}, left = 0x0, right = 0x0, object = 0xc6901a50, pindex = 1, phys_addr = 296792064, md = {pv_list_count = 0, pv_list = {tqh_first = 0x0, tqh_last = 0xc0da6728}}, queue = 33, flags = 4, pc = 11, wire_count = 0, hold_count = 0, act_count = 0 '\0', busy = 1 '\001', valid = 255 '', dirty = 255 '', cow = 0} So it seems as though "m" is getting "lost". What follows that seems to be undefined behavior. (I have slightly modified the above. valid had a 'y' shaped upper-8-bit symbol between the quotes, and formatted it to fit in 80 columns). I'll admit I'm quite green when it comes to debugging kernels, especially 5.x kernels. It gets really tricky when some functions trace back to .h files, and not all of the variables seem available to the debugger. The servers appear to work fine without PAE enabled, if that's of interest. This gdb session is still active and I hope to keep it active in case there are other commands you'd like me to run that might help shed some light on the situation.