Date: Sun, 24 Jun 2007 00:30:20 -0400 From: Adam McDougall <mcdouga9@egr.msu.edu> To: Kris Kennaway <kris@obsecurity.org> Cc: stable@freebsd.org, Kai <kai@xs4all.nl> Subject: [vfs_bio] Re: Fatal trap 12: page fault while in kernel mode (with potential cause, fix?) Message-ID: <20070624043020.GC31122@egr.msu.edu> In-Reply-To: <20070423155552.GB1006@xor.obsecurity.org> References: <20070411105332.GC7847@xs4all.nl> <20070419123329.GA10189@xs4all.nl> <20070423153547.GD20155@xs4all.nl> <20070423155552.GB1006@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Apr 23, 2007 at 11:55:52AM -0400, Kris Kennaway wrote: On Mon, Apr 23, 2007 at 05:35:47PM +0200, Kai wrote: > On Thu, Apr 19, 2007 at 02:33:29PM +0200, Kai wrote: > > On Wed, Apr 11, 2007 at 12:53:32PM +0200, Kai wrote: > > > > > > Hello all, > > > > > > We're running into regular panics on our webserver after upgrading > > > from 4.x to 6.2-stable: > > > > Hi all, > > To continue this story, a colleague wrote a small program in C that launches > 40 threads to randomly append and write to 10 files on an NFS mounted > filesystem. > > If I keep removing the files on one of the other machines in a while loop, > the first system panics: > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x34 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc06bdefa > stack pointer = 0x28:0xeb9f69b8 > frame pointer = 0x28:0xeb9f69c4 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 73626 (nfscrash) > trap number = 12 > panic: page fault > cpuid = 1 > Uptime: 3h2m14s > > Sounds like a nice denial of service problem. I can hand the program to > developers on request. Please send it to me. Panics are always much easier to get fixed if they come with a test case that developer can use to reproduce it. Kris I have been working on this problem all weekend and I have a strong hunch at this point that it is a result of 1.424 of sys/kern/vfs_bio.c which was between FreeBSD 5.1 and 5.2. This hunch is currently being verified by a system that was cvsupped to code just before 1.424, and it has been running about 7 times longer than the usual time required to crash. I am currently attempting to craft a patch for 6.2 that essentially backs out the change to see if that works, but if this information can help send a FreeBSD developer down the right trail to a proper fix, great. I will follow up with more detailed findings and results tonight or soon. links: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.423;r2=1.424 related to 1.424: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.420&r2=1.421 Commit emails: http://docs.freebsd.org/cgi/mid.cgi?200311150845.hAF8jawU027349 http://docs.freebsd.org/cgi/mid.cgi?200311110445.hAB4jbYw093253
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070624043020.GC31122>