Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Aug 2008 14:39:47 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        "Johan Kuuse" <kuuse@redantigua.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: kernel panic
Message-ID:  <200808121439.48158.jhb@freebsd.org>
In-Reply-To: <1179.83.49.238.144.1218565410.frodo@webmail.bilbomedia.com>
References:  <1179.83.49.238.144.1218565410.frodo@webmail.bilbomedia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 12 August 2008 02:23:30 pm Johan Kuuse wrote:
> > On Tuesday 12 August 2008 02:42:52 am Johan Kuuse wrote:
> >> On Monday 11 August 2008 23:04:30 John Baldwin wrote:
> >> > On Sunday 10 August 2008 10:01:49 pm Johan Kuuse wrote:
> >> > > Hi,
> >> > >
> >> > > I am a kgdb newbie, so please be patient.
> >> > > I suspect (just based on the fact that this is the 4th time I edit 
text
> >> >
> >> > files on my NTFS partition through ntfs-3g, using Emacs, and getting
> >> > frequent I/O error messages inside Emacs, and then a kernel panic) that
> >> > this is a ntfs-3g related problem.
> >> >
> >> > > If you ask me exactly how to reproduce it, I sorry, I can tell you
> >> > > exactly
> >> >
> >> > (but see the kgdb output below).
> >> >
> >> > > Anyway, the kernel seems to panic at /usr/src/sys/kern/vfs_bio.c:1530
> >> > >
> >> > > Just a suggestion for a patch (without knowing the functionality
> >> >
> >> > of /usr/src/sys/kern/vfs_bio.c):
> >> > > The line where the kernel panics:
> >> > > /usr/src/sys/kern/vfs_bio.c:
> >> > > ----------------------------------
> >> > > VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
> >> > > ...
> >> > > ----------------------------------
> >> > >
> >> > > Comparing to another file, which does error checking before calling
> >> >
> >> > VM_OBJECT_LOCK:
> >> > > /usr/src/sys/kern/vfs_aio.c:
> >> > > ----------------------------------
> >> > > if (vp->v_object != NULL) {
> >> > >     VM_OBJECT_LOCK(vp->v_object);
> >> > > ...
> >> > > ----------------------------------
> >> > >
> >> > > Perhaps the kernel panic could be avoided with the following patch?
> >> > > /usr/src/sys/kern/vfs_bio.c (suggested patch):
> >> > > ----------------------------------
> >> > > if ((bp->b_bufobj != NULL) && (bp->b_bufobj->bo_object != NULL)) {
> >> > >     VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
> >> > > ...
> >> > > ----------------------------------
> >> > >
> >> > > Please let me know if you need more information.
> >> > >
> >> > > Regards,
> >> > > Johan Kuuse
> >> > >
> >> > 
> -----------------------------------------------------------------------
> >> > >------------------------------------ kgdb kernel.debug
> >> > > /var/crash/vmcore.1
> >> > > [GDB will not be able to debug user-mode threads:
> >> > > /usr/lib/libthread_db.so:
> >> >
> >> > Undefined symbol "ps_pglobal_lookup"]
> >> >
> >> > > GNU gdb 6.1.1 [FreeBSD]
> >> > > Copyright 2004 Free Software Foundation, Inc.
> >> > > GDB is free software, covered by the GNU General Public License, and
> >> > > you are welcome to change it and/or distribute copies of it under
> >> > > certain
> >> >
> >> > conditions.
> >> >
> >> > > Type "show copying" to see the conditions.
> >> > > There is absolutely no warranty for GDB.  Type "show warranty" for
> >> > > details. This GDB was configured as "i386-marcel-freebsd".
> >> > >
> >> > > Unread portion of the kernel message buffer:
> >> > >
> >> > >
> >> > > Fatal trap 12: page fault while in kernel mode
> >> > > cpuid = 0; apic id = 00
> >> > > fault virtual address   = 0x34
> >> > > fault code              = supervisor read, page not present
> >> > > instruction pointer     = 0x20:0xc07b6de4
> >> > > stack pointer           = 0x28:0xe79de7c8
> >> > > frame pointer           = 0x28:0xe79de7e8
> >> > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >> > >                         = DPL 0, pres 1, def32 1, gran 1
> >> > > processor eflags        = interrupt enabled, resume, IOPL = 0
> >> > > current process         = 1214 (opera)
> >> > > trap number             = 12
> >> > > panic: page fault
> >> > > cpuid = 0
> >> > > Uptime: 5h20m30s
> >> > > Physical memory: 2035 MB
> >> > > Dumping 218 MB: 203 187 171 155 139 123 107 91 75 59 43 27 11
> >> > >
> >> > > #0  doadump () at pcpu.h:195
> >> > > 195             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
> >> > > (kgdb) list *0xc07b6de4
> >> > > 0xc07b6de4 is in vfs_vmio_release (/usr/src/sys/kern/vfs_bio.c:1530).
> >> > > 1525    vfs_vmio_release(struct buf *bp)
> >> > > 1526    {
> >> > > 1527            int i;
> >> > > 1528            vm_page_t m;
> >> > > 1529
> >> > > 1530            VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
> >> > > 1531            vm_page_lock_queues();
> >> > > 1532            for (i = 0; i < bp->b_npages; i++) {
> >> > > 1533                    m = bp->b_pages[i];
> >> > > 1534                    bp->b_pages[i] = NULL;
> >> > > (kgdb) bt
> >> > > #0  doadump () at pcpu.h:195
> >> > > #1  0xc0754457 in boot (howto=260) at
> >> > > /usr/src/sys/kern/kern_shutdown.c:409 #2  0xc0754719 in panic
> >> > > (fmt=Variable "fmt" is not available.
> >> > > ) at /usr/src/sys/kern/kern_shutdown.c:563
> >> > > #3  0xc0a4905c in trap_fatal (frame=0xe79de788, eva=52)
> >> >
> >> > at /usr/src/sys/i386/i386/trap.c:899
> >> >
> >> > > #4  0xc0a492e0 in trap_pfault (frame=0xe79de788, usermode=0, eva=52)
> >> >
> >> > at /usr/src/sys/i386/i386/trap.c:812
> >> >
> >> > > #5  0xc0a49c8c in trap (frame=0xe79de788)
> >> >
> >> > at /usr/src/sys/i386/i386/trap.c:490
> >> >
> >> > > #6  0xc0a2fc0b in calltrap () 
at /usr/src/sys/i386/i386/exception.s:139
> >> > > #7  0xc07b6de4 in vfs_vmio_release (bp=0xd927e33c)
> >> >
> >> > at /usr/src/sys/kern/vfs_bio.c:1530
> >> >
> >> > > #8  0xc07b8a81 in getnewbuf (slpflag=0, slptimeo=0, size=Variable
> >> > > "size" is
> >> >
> >> > not available.
> >> >
> >> > > ) at /usr/src/sys/kern/vfs_bio.c:1847
> >> > > #9  0xc07ba118 in getblk (vp=0xc8891bb0, blkno=0, size=2048, 
slpflag=0,
> >> >
> >> > slptimeo=0, flags=Variable "flags" is not available.
> >> >
> >> > > ) at /usr/src/sys/kern/vfs_bio.c:2602
> >> > > #10 0xc0932815 in ffs_balloc_ufs2 (vp=0xc8891bb0,
> >> >
> >> > startoffset=Variable "startoffset" is not available.
> >> >
> >> > > ) at /usr/src/sys/ufs/ffs/ffs_balloc.c:699
> >> > > #11 0xc0952a85 in ffs_write (ap=0xe79debc4)
> >> >
> >> > at /usr/src/sys/ufs/ffs/ffs_vnops.c:720
> >> >
> >> > > #12 0xc0a5efc6 in VOP_WRITE_APV (vop=0xc0b93c60, a=0xe79debc4) at
> >> >
> >> > vnode_if.c:691
> >> >
> >> > > #13 0xc07dbf37 in vn_write (fp=0xc85f3168, uio=0xe79dec60,
> >> >
> >> > active_cred=0xc61c6300, flags=0, td=0xc583fc60) at vnode_if.h:373
> >> >
> >> > > #14 0xc07875e7 in dofilewrite (td=0xc583fc60, fd=17, fp=0xc85f3168,
> >> >
> >> > auio=0xe79dec60, offset=-1, flags=0) at file.h:254
> >> >
> >> > > #15 0xc07878c8 in kern_writev (td=0xc583fc60, fd=17, auio=0xe79dec60)
> >> >
> >> > at /usr/src/sys/kern/sys_generic.c:401
> >> >
> >> > > #16 0xc078793f in write (td=0xc583fc60, uap=0xe79decfc)
> >> >
> >> > at /usr/src/sys/kern/sys_generic.c:317
> >> >
> >> > > #17 0xc0a49635 in syscall (frame=0xe79ded38)
> >> >
> >> > at /usr/src/sys/i386/i386/trap.c:1035
> >> >
> >> > > #18 0xc0a2fc70 in Xint0x80_syscall ()
> >> >
> >> > at /usr/src/sys/i386/i386/exception.s:196
> >> >
> >> > > #19 0x00000033 in ?? ()
> >> > > Previous frame inner to this frame (corrupt stack?)
> >> >
> >> > FYI, you got the panic in ffs/ufs, not fuse.  I've seen this at work on
> >> > 6.x with NFS with no clues on what causes it.  You can start by going 
to
> >> > frame 7 and doing 'p *bp'.
> >>
> >> Thanks for the hints.
> >> See below for more debug output.
> >> I recognize that the bp struct members b_data and b_kvabase both point to 
a
> >> chunk of memory containing the text of the Opera web page I was reading
> >> when the kernel crashed. (This is indicated above: current process
> >> = 1214 (opera))
> >>
> >> But what is most interesting is that b_bufobj = 0x0
> >> Obviously, then trying to access bp->b_bufobj->bo_object will cause a
> >> crash. So I think it would be a good idea to NULL-check the struct member
> >> before trying to access it. How should I proceed? Should I post this as a
> >> possible bug somewhere else, to another list?
> >
> > Unfortunately, it is a worse problem that b_bufobj is NULL.  That means 
there
> > is a bug elsewhere.  I'll look at this some more.
> >
> > Hmm, can you reproduce this at all?  If so, can you try the patch below.
> > Hopefully it panics here which might help:
> >
> > Index: vfs_subr.c
> > ===================================================================
> > --- vfs_subr.c	(revision 181629)
> > +++ vfs_subr.c	(working copy)
> > @@ -1546,6 +1546,9 @@
> >  	CTR3(KTR_BUF, "brelvp(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags);
> >  	KASSERT(bp->b_vp != NULL, ("brelvp: NULL"));
> >
> > +	if (bp->flags & B_VMIO)
> > +		panic("brelvp of B_VMIO buffer");
> > +
> >  	/*
> >  	 * Delete from old vnode list, if on one.
> >  	 */
> >
> > --
> > John Baldwin
> >
> 
> Sorry, at the moment I don't know how to reproduce the crash.
> I mentioned ntfs-ng/fuse as I got the impression that they caused a heavy 
load
> on my box, but in the end, it was Opera which caused the crash (also causing 
a
> heavy load, however).
> What I can do is to apply your patch and play around with CPU-consuming apps 
to
> try if I can reproduce the crash during heavy load.

Ok.

> Currently I'm running 7.-0-RELEASE.
> Do you recommend me to upgrade to STABLE before applying the patch?

No, you can just leave it as it is.  At work I've seen this occasionally on 
6.x, so it's probably an older bug.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200808121439.48158.jhb>