From owner-freebsd-stable@FreeBSD.ORG Wed Sep 3 21:46:20 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24D8810663BA; Wed, 3 Sep 2008 21:46:20 +0000 (UTC) (envelope-from kuuse@redantigua.com) Received: from kilimanjaro.scorpionshops.com (kilimanjaro.scorpionshops.com [83.140.32.147]) by mx1.freebsd.org (Postfix) with ESMTP id 632878FC1A; Wed, 3 Sep 2008 21:46:19 +0000 (UTC) (envelope-from kuuse@redantigua.com) Received: from [192.168.1.128] (144.Red-83-49-238.dynamicIP.rima-tde.net [83.49.238.144]) by kilimanjaro.scorpionshops.com (Postfix) with ESMTP id 428F2D38055; Wed, 3 Sep 2008 23:46:16 +0200 (CEST) From: Johan Kuuse Organization: Red Antigua To: John Baldwin Date: Wed, 3 Sep 2008 23:46:13 +0200 User-Agent: KMail/1.9.7 References: <1179.83.49.238.144.1218565410.frodo@webmail.bilbomedia.com> <200808121439.48158.jhb@freebsd.org> In-Reply-To: <200808121439.48158.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809032346.14053.kuuse@redantigua.com> Cc: freebsd-stable@freebsd.org Subject: Re: kernel panic X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2008 21:46:20 -0000 On Tuesday 12 August 2008 20:39:47 you wrote: > On Tuesday 12 August 2008 02:23:30 pm Johan Kuuse wrote: > > > On Tuesday 12 August 2008 02:42:52 am Johan Kuuse wrote: > > >> On Monday 11 August 2008 23:04:30 John Baldwin wrote: > > >> > On Sunday 10 August 2008 10:01:49 pm Johan Kuuse wrote: > > >> > > Hi, > > >> > > > > >> > > I am a kgdb newbie, so please be patient. > > >> > > I suspect (just based on the fact that this is the 4th time I edit > text > > >> > > > >> > files on my NTFS partition through ntfs-3g, using Emacs, and getting > > >> > frequent I/O error messages inside Emacs, and then a kernel panic) that > > >> > this is a ntfs-3g related problem. > > >> > > > >> > > If you ask me exactly how to reproduce it, I sorry, I can tell you > > >> > > exactly > > >> > > > >> > (but see the kgdb output below). > > >> > > > >> > > Anyway, the kernel seems to panic at /usr/src/sys/kern/vfs_bio.c:1530 > > >> > > > > >> > > Just a suggestion for a patch (without knowing the functionality > > >> > > > >> > of /usr/src/sys/kern/vfs_bio.c): > > >> > > The line where the kernel panics: > > >> > > /usr/src/sys/kern/vfs_bio.c: > > >> > > ---------------------------------- > > >> > > VM_OBJECT_LOCK(bp->b_bufobj->bo_object); > > >> > > ... > > >> > > ---------------------------------- > > >> > > > > >> > > Comparing to another file, which does error checking before calling > > >> > > > >> > VM_OBJECT_LOCK: > > >> > > /usr/src/sys/kern/vfs_aio.c: > > >> > > ---------------------------------- > > >> > > if (vp->v_object != NULL) { > > >> > > VM_OBJECT_LOCK(vp->v_object); > > >> > > ... > > >> > > ---------------------------------- > > >> > > > > >> > > Perhaps the kernel panic could be avoided with the following patch? > > >> > > /usr/src/sys/kern/vfs_bio.c (suggested patch): > > >> > > ---------------------------------- > > >> > > if ((bp->b_bufobj != NULL) && (bp->b_bufobj->bo_object != NULL)) { > > >> > > VM_OBJECT_LOCK(bp->b_bufobj->bo_object); > > >> > > ... > > >> > > ---------------------------------- > > >> > > > > >> > > Please let me know if you need more information. > > >> > > > > >> > > Regards, > > >> > > Johan Kuuse > > >> > > > > >> > > > ----------------------------------------------------------------------- > > >> > >------------------------------------ kgdb kernel.debug > > >> > > /var/crash/vmcore.1 > > >> > > [GDB will not be able to debug user-mode threads: > > >> > > /usr/lib/libthread_db.so: > > >> > > > >> > Undefined symbol "ps_pglobal_lookup"] > > >> > > > >> > > GNU gdb 6.1.1 [FreeBSD] > > >> > > Copyright 2004 Free Software Foundation, Inc. > > >> > > GDB is free software, covered by the GNU General Public License, and > > >> > > you are welcome to change it and/or distribute copies of it under > > >> > > certain > > >> > > > >> > conditions. > > >> > > > >> > > Type "show copying" to see the conditions. > > >> > > There is absolutely no warranty for GDB. Type "show warranty" for > > >> > > details. This GDB was configured as "i386-marcel-freebsd". > > >> > > > > >> > > Unread portion of the kernel message buffer: > > >> > > > > >> > > > > >> > > Fatal trap 12: page fault while in kernel mode > > >> > > cpuid = 0; apic id = 00 > > >> > > fault virtual address = 0x34 > > >> > > fault code = supervisor read, page not present > > >> > > instruction pointer = 0x20:0xc07b6de4 > > >> > > stack pointer = 0x28:0xe79de7c8 > > >> > > frame pointer = 0x28:0xe79de7e8 > > >> > > code segment = base 0x0, limit 0xfffff, type 0x1b > > >> > > = DPL 0, pres 1, def32 1, gran 1 > > >> > > processor eflags = interrupt enabled, resume, IOPL = 0 > > >> > > current process = 1214 (opera) > > >> > > trap number = 12 > > >> > > panic: page fault > > >> > > cpuid = 0 > > >> > > Uptime: 5h20m30s > > >> > > Physical memory: 2035 MB > > >> > > Dumping 218 MB: 203 187 171 155 139 123 107 91 75 59 43 27 11 > > >> > > > > >> > > #0 doadump () at pcpu.h:195 > > >> > > 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); > > >> > > (kgdb) list *0xc07b6de4 > > >> > > 0xc07b6de4 is in vfs_vmio_release (/usr/src/sys/kern/vfs_bio.c:1530). > > >> > > 1525 vfs_vmio_release(struct buf *bp) > > >> > > 1526 { > > >> > > 1527 int i; > > >> > > 1528 vm_page_t m; > > >> > > 1529 > > >> > > 1530 VM_OBJECT_LOCK(bp->b_bufobj->bo_object); > > >> > > 1531 vm_page_lock_queues(); > > >> > > 1532 for (i = 0; i < bp->b_npages; i++) { > > >> > > 1533 m = bp->b_pages[i]; > > >> > > 1534 bp->b_pages[i] = NULL; > > >> > > (kgdb) bt > > >> > > #0 doadump () at pcpu.h:195 > > >> > > #1 0xc0754457 in boot (howto=260) at > > >> > > /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc0754719 in panic > > >> > > (fmt=Variable "fmt" is not available. > > >> > > ) at /usr/src/sys/kern/kern_shutdown.c:563 > > >> > > #3 0xc0a4905c in trap_fatal (frame=0xe79de788, eva=52) > > >> > > > >> > at /usr/src/sys/i386/i386/trap.c:899 > > >> > > > >> > > #4 0xc0a492e0 in trap_pfault (frame=0xe79de788, usermode=0, eva=52) > > >> > > > >> > at /usr/src/sys/i386/i386/trap.c:812 > > >> > > > >> > > #5 0xc0a49c8c in trap (frame=0xe79de788) > > >> > > > >> > at /usr/src/sys/i386/i386/trap.c:490 > > >> > > > >> > > #6 0xc0a2fc0b in calltrap () > at /usr/src/sys/i386/i386/exception.s:139 > > >> > > #7 0xc07b6de4 in vfs_vmio_release (bp=0xd927e33c) > > >> > > > >> > at /usr/src/sys/kern/vfs_bio.c:1530 > > >> > > > >> > > #8 0xc07b8a81 in getnewbuf (slpflag=0, slptimeo=0, size=Variable > > >> > > "size" is > > >> > > > >> > not available. > > >> > > > >> > > ) at /usr/src/sys/kern/vfs_bio.c:1847 > > >> > > #9 0xc07ba118 in getblk (vp=0xc8891bb0, blkno=0, size=2048, > slpflag=0, > > >> > > > >> > slptimeo=0, flags=Variable "flags" is not available. > > >> > > > >> > > ) at /usr/src/sys/kern/vfs_bio.c:2602 > > >> > > #10 0xc0932815 in ffs_balloc_ufs2 (vp=0xc8891bb0, > > >> > > > >> > startoffset=Variable "startoffset" is not available. > > >> > > > >> > > ) at /usr/src/sys/ufs/ffs/ffs_balloc.c:699 > > >> > > #11 0xc0952a85 in ffs_write (ap=0xe79debc4) > > >> > > > >> > at /usr/src/sys/ufs/ffs/ffs_vnops.c:720 > > >> > > > >> > > #12 0xc0a5efc6 in VOP_WRITE_APV (vop=0xc0b93c60, a=0xe79debc4) at > > >> > > > >> > vnode_if.c:691 > > >> > > > >> > > #13 0xc07dbf37 in vn_write (fp=0xc85f3168, uio=0xe79dec60, > > >> > > > >> > active_cred=0xc61c6300, flags=0, td=0xc583fc60) at vnode_if.h:373 > > >> > > > >> > > #14 0xc07875e7 in dofilewrite (td=0xc583fc60, fd=17, fp=0xc85f3168, > > >> > > > >> > auio=0xe79dec60, offset=-1, flags=0) at file.h:254 > > >> > > > >> > > #15 0xc07878c8 in kern_writev (td=0xc583fc60, fd=17, auio=0xe79dec60) > > >> > > > >> > at /usr/src/sys/kern/sys_generic.c:401 > > >> > > > >> > > #16 0xc078793f in write (td=0xc583fc60, uap=0xe79decfc) > > >> > > > >> > at /usr/src/sys/kern/sys_generic.c:317 > > >> > > > >> > > #17 0xc0a49635 in syscall (frame=0xe79ded38) > > >> > > > >> > at /usr/src/sys/i386/i386/trap.c:1035 > > >> > > > >> > > #18 0xc0a2fc70 in Xint0x80_syscall () > > >> > > > >> > at /usr/src/sys/i386/i386/exception.s:196 > > >> > > > >> > > #19 0x00000033 in ?? () > > >> > > Previous frame inner to this frame (corrupt stack?) > > >> > > > >> > FYI, you got the panic in ffs/ufs, not fuse. I've seen this at work on > > >> > 6.x with NFS with no clues on what causes it. You can start by going > to > > >> > frame 7 and doing 'p *bp'. > > >> > > >> Thanks for the hints. > > >> See below for more debug output. > > >> I recognize that the bp struct members b_data and b_kvabase both point to > a > > >> chunk of memory containing the text of the Opera web page I was reading > > >> when the kernel crashed. (This is indicated above: current process > > >> = 1214 (opera)) > > >> > > >> But what is most interesting is that b_bufobj = 0x0 > > >> Obviously, then trying to access bp->b_bufobj->bo_object will cause a > > >> crash. So I think it would be a good idea to NULL-check the struct member > > >> before trying to access it. How should I proceed? Should I post this as a > > >> possible bug somewhere else, to another list? > > > > > > Unfortunately, it is a worse problem that b_bufobj is NULL. That means > there > > > is a bug elsewhere. I'll look at this some more. > > > > > > Hmm, can you reproduce this at all? If so, can you try the patch below. > > > Hopefully it panics here which might help: > > > > > > Index: vfs_subr.c > > > =================================================================== > > > --- vfs_subr.c (revision 181629) > > > +++ vfs_subr.c (working copy) > > > @@ -1546,6 +1546,9 @@ > > > CTR3(KTR_BUF, "brelvp(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags); > > > KASSERT(bp->b_vp != NULL, ("brelvp: NULL")); > > > > > > + if (bp->flags & B_VMIO) > > > + panic("brelvp of B_VMIO buffer"); > > > + > > > /* > > > * Delete from old vnode list, if on one. > > > */ > > > > > > -- > > > John Baldwin > > > > > > > Sorry, at the moment I don't know how to reproduce the crash. > > I mentioned ntfs-ng/fuse as I got the impression that they caused a heavy > load > > on my box, but in the end, it was Opera which caused the crash (also causing > a > > heavy load, however). > > What I can do is to apply your patch and play around with CPU-consuming apps > to > > try if I can reproduce the crash during heavy load. > > Ok. > > > Currently I'm running 7.-0-RELEASE. > > Do you recommend me to upgrade to STABLE before applying the patch? > > No, you can just leave it as it is. At work I've seen this occasionally on > 6.x, so it's probably an older bug. > Hi again, I got another kernel panic in vfs_subr.c, but not in the patched function brelvp(), but in delmntque(). At the moment of the kernel panic, I was installing the sysutils/e2fsprogs, which performs some ext2fs mount/umount tests during install. Debug output listed below. Please let me know if you need some more info. Regards, Johan Kuuse -------------------------------------------------------------------------------- cd /usr/obj/usr/src/sys/MYDEBUGKERNEL kgdb kernel.debug /var/crash/vmcore.4 -------------------------------------------------------------------------------- [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor write, page not present instruction pointer = 0x20:0xc07cb876 stack pointer = 0x28:0xe79a8824 frame pointer = 0x28:0xe79a8860 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12674 (mtree) trap number = 12 panic: page fault cpuid = 0 Uptime: 29m39s Physical memory: 2035 MB Dumping 152 MB: 137 121 105 89 73 57 41 25 9 #0 doadump () at pcpu.h:195 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); -------------------------------------------------------------------------------- (kgdb) list *0xc07cb876 -------------------------------------------------------------------------------- 0xc07cb876 is in vgonel (/usr/src/sys/kern/vfs_subr.c:990). 985 return; 986 MNT_ILOCK(mp); 987 vp->v_mount = NULL; 988 VNASSERT(mp->mnt_nvnodelistsize > 0, vp, 989 ("bad mount point vnode list size")); 990 TAILQ_REMOVE(&mp->mnt_nvnodelist, vp, v_nmntvnodes); 991 mp->mnt_nvnodelistsize--; 992 MNT_REL(mp); 993 MNT_IUNLOCK(mp); 994 } -------------------------------------------------------------------------------- (kgdb)