Date: Sat, 6 Aug 2005 11:04:11 -0700 (PDT) From: dpk <dpk@dpk.net> To: Frank McConnell <fmc@reanimators.org> Cc: Ted Wisniewski <ted@ness.plymouth.edu>, freebsd-stable@freebsd.org, Chris Gabe <chris@borderware.com> Subject: Re: RELENG_5 PAE panic Message-ID: <20050806105211.I15658@shared10.hosting.flyingcroc.net> In-Reply-To: <200508050514.j755EWpH019403@lots.reanimators.org> References: <200507290034.j6T0YLdZ014411@lots.reanimators.org> <20050729091624.R74149@fledge.watson.org> <200507291809.j6TI9p37035628@lots.reanimators.org> <200508021726.j72HQPQG051111@lots.reanimators.org> <200508022220.j72MKvUt056654@lots.reanimators.org> <200508050514.j755EWpH019403@lots.reanimators.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 4 Aug 2005, Frank McConnell wrote: > Further debugging led me to the conclusion that the problem is in > pmap_protect(), in src/sys/i386/i386/pmap.c; and has to do with a > 32-bit-truncated pt_entry_t being passed to PHYS_TO_VM_PAGE(). > (pt_entry_t is 64 bits if the kernel is built with PAE.) This caused > a page fault in vm_page_flag_set() which left the thread deadlocked > while holding vm_page_queue_mtx and in turn led to a panic when > another thread tried to acquire vm_page_queue_mtx. > > Then I checked the cvs logs, and saw rev 1.524, which looks like what > I was thinking about as a fix, so I'm giving it a spin on top of > earlier-this-week's RELENG_5. Thus far I'll say that with that change > my usual way of provoking the problem hasn't, yet. > > I'm going to try to get this PC put back into co-lo where it can > get some production-like testing this weekend. It'd be nice to get > this fix MFC'd to RELENG_5 too. > > -Frank McConnell FWIW, on a server we have which was panicing quite frequently, performing the above mentioned modification seems to have resolved the issue. The server has been repeatedly building kernels while having another process run the server out of RAM. Before, this would cause it to panic with one of 2 (maybe 3) messages in well under an hour. Now it's been going for 24 hours straight without even a stray bus error. This appears to resolve i386/84563, and I believe it should resolve related bugs kern/82846 (identical panic) and i386/84306. The specific fix Frank has mentioned is this: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/i386/pmap.c.diff?r1=1.523&r2=1.524&f=h committed by jhb and submitted by Greg Taleck. Even though this pmap.c change was applied to a later version than distributed with FreeBSD 5.4, the modifications still apply.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050806105211.I15658>