Date: Wed, 20 Aug 2008 19:56:58 +0200 From: Johan Kuuse <kuuse@redantigua.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: kernel panic Message-ID: <200808201956.58751.kuuse@redantigua.com> In-Reply-To: <200808121439.48158.jhb@freebsd.org> References: <1179.83.49.238.144.1218565410.frodo@webmail.bilbomedia.com> <200808121439.48158.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 12 August 2008 20:39:47 John Baldwin wrote: > On Tuesday 12 August 2008 02:23:30 pm Johan Kuuse wrote: > > > On Tuesday 12 August 2008 02:42:52 am Johan Kuuse wrote: > > >> On Monday 11 August 2008 23:04:30 John Baldwin wrote: > > >> > On Sunday 10 August 2008 10:01:49 pm Johan Kuuse wrote: > > >> > > Hi, > > >> > > > > >> > > I am a kgdb newbie, so please be patient. > > >> > > I suspect (just based on the fact that this is the 4th time I edit > text > > >> > > > >> > files on my NTFS partition through ntfs-3g, using Emacs, and getting > > >> > frequent I/O error messages inside Emacs, and then a kernel panic) that > > >> > this is a ntfs-3g related problem. > > >> > > > >> > > If you ask me exactly how to reproduce it, I sorry, I can tell you > > >> > > exactly > > >> > > > >> > (but see the kgdb output below). > > >> > > > >> > > Anyway, the kernel seems to panic at /usr/src/sys/kern/vfs_bio.c:1530 > > >> > > > > >> > > Just a suggestion for a patch (without knowing the functionality > > >> > > > >> > of /usr/src/sys/kern/vfs_bio.c): > > >> > > The line where the kernel panics: > > >> > > /usr/src/sys/kern/vfs_bio.c: > > >> > > ---------------------------------- > > >> > > VM_OBJECT_LOCK(bp->b_bufobj->bo_object); > > >> > > ... > > >> > > ---------------------------------- > > >> > > > > >> > > Comparing to another file, which does error checking before calling > > >> > > > >> > VM_OBJECT_LOCK: > > >> > > /usr/src/sys/kern/vfs_aio.c: > > >> > > ---------------------------------- > > >> > > if (vp->v_object != NULL) { > > >> > > VM_OBJECT_LOCK(vp->v_object); > > >> > > ... > > >> > > ---------------------------------- > > >> > > > > >> > > Perhaps the kernel panic could be avoided with the following patch? > > >> > > /usr/src/sys/kern/vfs_bio.c (suggested patch): > > >> > > ---------------------------------- > > >> > > if ((bp->b_bufobj != NULL) && (bp->b_bufobj->bo_object != NULL)) { > > >> > > VM_OBJECT_LOCK(bp->b_bufobj->bo_object); > > >> > > ... > > >> > > ---------------------------------- > > >> > > > > >> > > Please let me know if you need more information. > > >> > > > > >> > > Regards, > > >> > > Johan Kuuse > > >> > > > > >> > > > ----------------------------------------------------------------------- > > >> > >------------------------------------ kgdb kernel.debug > > >> > > /var/crash/vmcore.1 > > >> > > [GDB will not be able to debug user-mode threads: > > >> > > /usr/lib/libthread_db.so: > > >> > > > >> > Undefined symbol "ps_pglobal_lookup"] > > >> > > > >> > > GNU gdb 6.1.1 [FreeBSD] > > >> > > Copyright 2004 Free Software Foundation, Inc. > > >> > > GDB is free software, covered by the GNU General Public License, and > > >> > > you are welcome to change it and/or distribute copies of it under > > >> > > certain > > >> > > > >> > conditions. > > >> > > > >> > > Type "show copying" to see the conditions. > > >> > > There is absolutely no warranty for GDB. Type "show warranty" for > > >> > > details. This GDB was configured as "i386-marcel-freebsd". > > >> > > > > >> > > Unread portion of the kernel message buffer: > > >> > > > > >> > > > > >> > > Fatal trap 12: page fault while in kernel mode > > >> > > cpuid = 0; apic id = 00 > > >> > > fault virtual address = 0x34 > > >> > > fault code = supervisor read, page not present > > >> > > instruction pointer = 0x20:0xc07b6de4 > > >> > > stack pointer = 0x28:0xe79de7c8 > > >> > > frame pointer = 0x28:0xe79de7e8 > > >> > > code segment = base 0x0, limit 0xfffff, type 0x1b > > >> > > = DPL 0, pres 1, def32 1, gran 1 > > >> > > processor eflags = interrupt enabled, resume, IOPL = 0 > > >> > > current process = 1214 (opera) > > >> > > trap number = 12 > > >> > > panic: page fault > > >> > > cpuid = 0 > > >> > > Uptime: 5h20m30s > > >> > > Physical memory: 2035 MB > > >> > > Dumping 218 MB: 203 187 171 155 139 123 107 91 75 59 43 27 11 > > >> > > > > >> > > #0 doadump () at pcpu.h:195 > > >> > > 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); > > >> > > (kgdb) list *0xc07b6de4 > > >> > > 0xc07b6de4 is in vfs_vmio_release (/usr/src/sys/kern/vfs_bio.c:1530). > > >> > > 1525 vfs_vmio_release(struct buf *bp) > > >> > > 1526 { > > >> > > 1527 int i; > > >> > > 1528 vm_page_t m; > > >> > > 1529 > > >> > > 1530 VM_OBJECT_LOCK(bp->b_bufobj->bo_object); > > >> > > 1531 vm_page_lock_queues(); > > >> > > 1532 for (i = 0; i < bp->b_npages; i++) { > > >> > > 1533 m = bp->b_pages[i]; > > >> > > 1534 bp->b_pages[i] = NULL; > > >> > > (kgdb) bt > > >> > > #0 doadump () at pcpu.h:195 > > >> > > #1 0xc0754457 in boot (howto=260) at > > >> > > /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc0754719 in panic > > >> > > (fmt=Variable "fmt" is not available. > > >> > > ) at /usr/src/sys/kern/kern_shutdown.c:563 > > >> > > #3 0xc0a4905c in trap_fatal (frame=0xe79de788, eva=52) > > >> > > > >> > at /usr/src/sys/i386/i386/trap.c:899 > > >> > > > >> > > #4 0xc0a492e0 in trap_pfault (frame=0xe79de788, usermode=0, eva=52) > > >> > > > >> > at /usr/src/sys/i386/i386/trap.c:812 > > >> > > > >> > > #5 0xc0a49c8c in trap (frame=0xe79de788) > > >> > > > >> > at /usr/src/sys/i386/i386/trap.c:490 > > >> > > > >> > > #6 0xc0a2fc0b in calltrap () > at /usr/src/sys/i386/i386/exception.s:139 > > >> > > #7 0xc07b6de4 in vfs_vmio_release (bp=0xd927e33c) > > >> > > > >> > at /usr/src/sys/kern/vfs_bio.c:1530 > > >> > > > >> > > #8 0xc07b8a81 in getnewbuf (slpflag=0, slptimeo=0, size=Variable > > >> > > "size" is > > >> > > > >> > not available. > > >> > > > >> > > ) at /usr/src/sys/kern/vfs_bio.c:1847 > > >> > > #9 0xc07ba118 in getblk (vp=0xc8891bb0, blkno=0, size=2048, > slpflag=0, > > >> > > > >> > slptimeo=0, flags=Variable "flags" is not available. > > >> > > > >> > > ) at /usr/src/sys/kern/vfs_bio.c:2602 > > >> > > #10 0xc0932815 in ffs_balloc_ufs2 (vp=0xc8891bb0, > > >> > > > >> > startoffset=Variable "startoffset" is not available. > > >> > > > >> > > ) at /usr/src/sys/ufs/ffs/ffs_balloc.c:699 > > >> > > #11 0xc0952a85 in ffs_write (ap=0xe79debc4) > > >> > > > >> > at /usr/src/sys/ufs/ffs/ffs_vnops.c:720 > > >> > > > >> > > #12 0xc0a5efc6 in VOP_WRITE_APV (vop=0xc0b93c60, a=0xe79debc4) at > > >> > > > >> > vnode_if.c:691 > > >> > > > >> > > #13 0xc07dbf37 in vn_write (fp=0xc85f3168, uio=0xe79dec60, > > >> > > > >> > active_cred=0xc61c6300, flags=0, td=0xc583fc60) at vnode_if.h:373 > > >> > > > >> > > #14 0xc07875e7 in dofilewrite (td=0xc583fc60, fd=17, fp=0xc85f3168, > > >> > > > >> > auio=0xe79dec60, offset=-1, flags=0) at file.h:254 > > >> > > > >> > > #15 0xc07878c8 in kern_writev (td=0xc583fc60, fd=17, auio=0xe79dec60) > > >> > > > >> > at /usr/src/sys/kern/sys_generic.c:401 > > >> > > > >> > > #16 0xc078793f in write (td=0xc583fc60, uap=0xe79decfc) > > >> > > > >> > at /usr/src/sys/kern/sys_generic.c:317 > > >> > > > >> > > #17 0xc0a49635 in syscall (frame=0xe79ded38) > > >> > > > >> > at /usr/src/sys/i386/i386/trap.c:1035 > > >> > > > >> > > #18 0xc0a2fc70 in Xint0x80_syscall () > > >> > > > >> > at /usr/src/sys/i386/i386/exception.s:196 > > >> > > > >> > > #19 0x00000033 in ?? () > > >> > > Previous frame inner to this frame (corrupt stack?) > > >> > > > >> > FYI, you got the panic in ffs/ufs, not fuse. I've seen this at work on > > >> > 6.x with NFS with no clues on what causes it. You can start by going > to > > >> > frame 7 and doing 'p *bp'. > > >> > > >> Thanks for the hints. > > >> See below for more debug output. > > >> I recognize that the bp struct members b_data and b_kvabase both point to > a > > >> chunk of memory containing the text of the Opera web page I was reading > > >> when the kernel crashed. (This is indicated above: current process > > >> = 1214 (opera)) > > >> > > >> But what is most interesting is that b_bufobj = 0x0 > > >> Obviously, then trying to access bp->b_bufobj->bo_object will cause a > > >> crash. So I think it would be a good idea to NULL-check the struct member > > >> before trying to access it. How should I proceed? Should I post this as a > > >> possible bug somewhere else, to another list? > > > > > > Unfortunately, it is a worse problem that b_bufobj is NULL. That means > there > > > is a bug elsewhere. I'll look at this some more. > > > > > > Hmm, can you reproduce this at all? If so, can you try the patch below. > > > Hopefully it panics here which might help: > > > > > > Index: vfs_subr.c > > > =================================================================== > > > --- vfs_subr.c (revision 181629) > > > +++ vfs_subr.c (working copy) > > > @@ -1546,6 +1546,9 @@ > > > CTR3(KTR_BUF, "brelvp(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags); > > > KASSERT(bp->b_vp != NULL, ("brelvp: NULL")); > > > > > > + if (bp->flags & B_VMIO) > > > + panic("brelvp of B_VMIO buffer"); > > > + > > > /* > > > * Delete from old vnode list, if on one. > > > */ > > > > > > -- > > > John Baldwin > > > > > > > Sorry, at the moment I don't know how to reproduce the crash. > > I mentioned ntfs-ng/fuse as I got the impression that they caused a heavy > load > > on my box, but in the end, it was Opera which caused the crash (also causing > a > > heavy load, however). > > What I can do is to apply your patch and play around with CPU-consuming apps > to > > try if I can reproduce the crash during heavy load. > > Ok. > > > Currently I'm running 7.-0-RELEASE. > > Do you recommend me to upgrade to STABLE before applying the patch? > > No, you can just leave it as it is. At work I've seen this occasionally on > 6.x, so it's probably an older bug. > Hi again, Finally the kernel got panic again, with your patch applied. Anyhow, it seems like it crashed in some other function this time. Also, I wonder about the message "No source file for address 0xc0d11b14." when I tried to list the address for the instruction pointer. Does this mean I did something wrong in the kernel patch or rebuild? I added a backtrace, please tell me if you need some more info. Regards, Johan -------------------------------------------------------------------------------- Patch a kernel # cd /usr/src/sys/kern # patch < /usr/home/kuuse/doc/freebsd-7.0-kernel-patch-john-baldwin.patch Build a kernel # cd /usr/src # make buildkernel KERNCONF=MYDEBUGKERNEL # make installkernel KERNCONF=MYDEBUGKERNEL Debug after new kernel crash: http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html -------------------------------------------------------------------------------- cd /usr/obj/usr/src/sys/MYDEBUGKERNEL kgdb kernel.debug /var/crash/vmcore.2 -------------------------------------------------------------------------------- [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x10 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0d11b14 stack pointer = 0x28:0xe7acebc8 frame pointer = 0x28:0xe7acec10 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2319 (opera) trap number = 12 panic: page fault cpuid = 0 Uptime: 1h51m36s Physical memory: 2035 MB Dumping 253 MB: 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 #0 doadump () at pcpu.h:195 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); -------------------------------------------------------------------------------- (kgdb) list *0xc0d11b14 -------------------------------------------------------------------------------- No source file for address 0xc0d11b14. -------------------------------------------------------------------------------- (kgdb) bt -------------------------------------------------------------------------------- #0 doadump () at pcpu.h:195 #1 0xc0754457 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc0754719 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xc0a4905c in trap_fatal (frame=0xe7aceb88, eva=16) at /usr/src/sys/i386/i386/trap.c:899 #4 0xc0a492e0 in trap_pfault (frame=0xe7aceb88, usermode=0, eva=16) at /usr/src/sys/i386/i386/trap.c:812 #5 0xc0a49c8c in trap (frame=0xe7aceb88) at /usr/src/sys/i386/i386/trap.c:490 #6 0xc0a2fc0b in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc0d11b14 in ?? () Previous frame inner to this frame (corrupt stack?) --------------------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200808201956.58751.kuuse>