From owner-freebsd-stable@FreeBSD.ORG Sun Jun 24 04:45:53 2007 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9BA4C16A46D for ; Sun, 24 Jun 2007 04:45:53 +0000 (UTC) (envelope-from mcdouga9@daemon.egr.msu.edu) Received: from daemon.egr.msu.edu (daemon.egr.msu.edu [35.9.44.65]) by mx1.freebsd.org (Postfix) with ESMTP id 76EC713C469 for ; Sun, 24 Jun 2007 04:45:53 +0000 (UTC) (envelope-from mcdouga9@daemon.egr.msu.edu) Received: by daemon.egr.msu.edu (Postfix, from userid 21281) id AF1B21CC4B; Sun, 24 Jun 2007 00:30:20 -0400 (EDT) Date: Sun, 24 Jun 2007 00:30:20 -0400 From: Adam McDougall To: Kris Kennaway Message-ID: <20070624043020.GC31122@egr.msu.edu> References: <20070411105332.GC7847@xs4all.nl> <20070419123329.GA10189@xs4all.nl> <20070423153547.GD20155@xs4all.nl> <20070423155552.GB1006@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070423155552.GB1006@xor.obsecurity.org> User-Agent: Mutt/1.5.15 (2007-04-06) Cc: stable@freebsd.org, Kai Subject: [vfs_bio] Re: Fatal trap 12: page fault while in kernel mode (with potential cause, fix?) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jun 2007 04:45:53 -0000 On Mon, Apr 23, 2007 at 11:55:52AM -0400, Kris Kennaway wrote: On Mon, Apr 23, 2007 at 05:35:47PM +0200, Kai wrote: > On Thu, Apr 19, 2007 at 02:33:29PM +0200, Kai wrote: > > On Wed, Apr 11, 2007 at 12:53:32PM +0200, Kai wrote: > > > > > > Hello all, > > > > > > We're running into regular panics on our webserver after upgrading > > > from 4.x to 6.2-stable: > > > > Hi all, > > To continue this story, a colleague wrote a small program in C that launches > 40 threads to randomly append and write to 10 files on an NFS mounted > filesystem. > > If I keep removing the files on one of the other machines in a while loop, > the first system panics: > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x34 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc06bdefa > stack pointer = 0x28:0xeb9f69b8 > frame pointer = 0x28:0xeb9f69c4 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 73626 (nfscrash) > trap number = 12 > panic: page fault > cpuid = 1 > Uptime: 3h2m14s > > Sounds like a nice denial of service problem. I can hand the program to > developers on request. Please send it to me. Panics are always much easier to get fixed if they come with a test case that developer can use to reproduce it. Kris I have been working on this problem all weekend and I have a strong hunch at this point that it is a result of 1.424 of sys/kern/vfs_bio.c which was between FreeBSD 5.1 and 5.2. This hunch is currently being verified by a system that was cvsupped to code just before 1.424, and it has been running about 7 times longer than the usual time required to crash. I am currently attempting to craft a patch for 6.2 that essentially backs out the change to see if that works, but if this information can help send a FreeBSD developer down the right trail to a proper fix, great. I will follow up with more detailed findings and results tonight or soon. links: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.423;r2=1.424 related to 1.424: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.420&r2=1.421 Commit emails: http://docs.freebsd.org/cgi/mid.cgi?200311150845.hAF8jawU027349 http://docs.freebsd.org/cgi/mid.cgi?200311110445.hAB4jbYw093253