Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Jun 2007 00:30:20 -0400
From:      Adam McDougall <mcdouga9@egr.msu.edu>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        stable@freebsd.org, Kai <kai@xs4all.nl>
Subject:   [vfs_bio] Re: Fatal trap 12: page fault while in kernel mode (with potential cause, fix?)
Message-ID:  <20070624043020.GC31122@egr.msu.edu>
In-Reply-To: <20070423155552.GB1006@xor.obsecurity.org>
References:  <20070411105332.GC7847@xs4all.nl> <20070419123329.GA10189@xs4all.nl> <20070423153547.GD20155@xs4all.nl> <20070423155552.GB1006@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Apr 23, 2007 at 11:55:52AM -0400, Kris Kennaway wrote:

  On Mon, Apr 23, 2007 at 05:35:47PM +0200, Kai wrote:
  > On Thu, Apr 19, 2007 at 02:33:29PM +0200, Kai wrote:
  > > On Wed, Apr 11, 2007 at 12:53:32PM +0200, Kai wrote:
  > > > 
  > > > Hello all,
  > > > 
  > > > We're running into regular panics on our webserver after upgrading
  > > > from 4.x to 6.2-stable:
  > > 
  > 
  > Hi all,
  > 
  > To continue this story, a colleague wrote a small program in C that launches
  > 40 threads to randomly append and write to 10 files on an NFS mounted
  > filesystem. 
  > 
  > If I keep removing the files on one of the other machines in a while loop,
  > the first system panics:
  > 
  > Fatal trap 12: page fault while in kernel mode
  > cpuid = 1; apic id = 01
  > fault virtual address   = 0x34
  > fault code              = supervisor read, page not present
  > instruction pointer     = 0x20:0xc06bdefa
  > stack pointer           = 0x28:0xeb9f69b8
  > frame pointer           = 0x28:0xeb9f69c4
  > code segment            = base 0x0, limit 0xfffff, type 0x1b
  >                         = DPL 0, pres 1, def32 1, gran 1
  > processor eflags        = interrupt enabled, resume, IOPL = 0
  > current process         = 73626 (nfscrash)
  > trap number             = 12
  > panic: page fault
  > cpuid = 1
  > Uptime: 3h2m14s
  > 
  > Sounds like a nice denial of service problem. I can hand the program to
  > developers on request.
  
  Please send it to me.  Panics are always much easier to get fixed if
  they come with a test case that developer can use to reproduce it.
  
  Kris

I have been working on this problem all weekend and I have a strong hunch at this point 
that it is a result of 1.424 of sys/kern/vfs_bio.c which was between FreeBSD 5.1 and 
5.2.  This hunch is currently being verified by a system that was cvsupped to code 
just before 1.424, and it has been running about 7 times longer than the usual time 
required to crash.  I am currently attempting to craft a patch for 6.2 that essentially 
backs out the change to see if that works, but if this information can help send a 
FreeBSD developer down the right trail to a proper fix, great.  I will follow up with 
more detailed findings and results tonight or soon.

links:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.423;r2=1.424
related to 1.424:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.420&r2=1.421

Commit emails:
http://docs.freebsd.org/cgi/mid.cgi?200311150845.hAF8jawU027349
http://docs.freebsd.org/cgi/mid.cgi?200311110445.hAB4jbYw093253



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070624043020.GC31122>