Date: Mon, 09 Jun 2014 10:30:26 -0500 From: Alan Cox <alc@rice.edu> To: "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, alc@freebsd.org Subject: Re: svn commit: r266850 - in head/sys/arm/xscale: i80321 i8134x ixp425 pxa Message-ID: <5395D312.5000302@rice.edu> In-Reply-To: <20140609042206.GQ31367@funkthat.com> References: <CAJ-Vmo=h39AYXhPFBx7dzUe%2BQtksPB8QMaAQcoqoM6UiKZe2XA@mail.gmail.com> <20140529173803.GA5294@ci0.org> <20140530063228.GD43976@funkthat.com> <5388ABF1.3030200@rice.edu> <20140601081153.GU43976@funkthat.com> <53935755.70908@rice.edu> <20140608003944.GK31367@funkthat.com> <53949D96.3060409@rice.edu> <20140608235611.GP31367@funkthat.com> <53950BB9.3090808@rice.edu> <20140609042206.GQ31367@funkthat.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------030200060505090800030104 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 06/08/2014 23:22, John-Mark Gurney wrote: > Alan Cox wrote this message on Sun, Jun 08, 2014 at 20:19 -0500: >> On 06/08/2014 18:56, John-Mark Gurney wrote: >>> Alan Cox wrote this message on Sun, Jun 08, 2014 at 12:29 -0500: >>>> On 06/07/2014 19:39, John-Mark Gurney wrote: >>>>> Alan Cox wrote this message on Sat, Jun 07, 2014 at 13:17 -0500: >>>>>> On 06/01/2014 03:11, John-Mark Gurney wrote: >>>>>>> Alan Cox wrote this message on Fri, May 30, 2014 at 11:04 -0500: >>>>>>>> On 05/30/2014 01:32, John-Mark Gurney wrote: >>>>>>>>> Olivier Houchard wrote this message on Thu, May 29, 2014 at 19:38 +0200: >>>>>>>>>> On Thu, May 29, 2014 at 10:19:18AM -0700, Adrian Chadd wrote: >>>>>>>>>>> On 29 May 2014 10:16, Olivier Houchard <cognet@ci0.org> wrote: >>>>>>>>>>>> On Thu, May 29, 2014 at 10:14:53AM -0700, Adrian Chadd wrote: >>>>>>>>>>>>> Have you tested this on xscale hardware? >>>>>>>>>>>> Yeah, my two last commits were an attempt to get the AVILA kernel to boot >>>>>>>>>>>> again. >>>>>>>>>>> Woo! What can I provide to help you do this? :-) >>>>>>>>>>> >>>>>>>>>>> (Drinks? Food? Donations?) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Drinks and food are always appreciated ;) >>>>>>>>>> It almost boots for me now, except a few userland programs gets SIGSEGV or >>>>>>>>>> SIGILL along the way, trying to figure out why. >>>>>>>>> Thanks for fixing ddb... I'm getting panic messages again... bad >>>>>>>>> news is that my panic is still around: >>>>>>>>> panic: vm_page_alloc: page 0xc07e73b0 is wired >>>>>>>>> >>>>>>>>> Though, interestingly, it looks like sparc64 has a similar panic: >>>>>>>>> https://www.freebsd.org/cgi/query-pr.cgi?pr=187080 >>>>>>>>> >>>>>>>>> kib, Alan, any clue to why this is happening? Any suggestions as to >>>>>>>>> help track it down? >>>>>>>> I'm afraid not. The dump below shows a perfectly normal, in-use page. >>>>>>>> If this page had actually been free prior to the vm_page_alloc() call, >>>>>>>> then other fields, like dirty, would have been different. In other >>>>>>>> words, this isn't just a problem with the wire count. >>>>>>>> >>>>>>>> What object is vm_page_alloc() being performed on? >>>>>>> Is this enough? Or do you need more? >>>>>>> >>>>>>> panic: vm_page_alloc: page 0xc07e73b0 is wired, obj: 0xc1500b40 >>>>>>> KDB: enter: panic >>>>>>> [ thread pid 781 tid 100051 ] >>>>>>> Stopped at kdb_enter+0x40: ldrb r15, [r15, r15, ror r15]! >>>>>>> db> show object/f 0xc1500b40 >>>>>>> Object 0xc1500b40: type=2, size=0xa, res=9, ref=0, flags=0x0 ruid -1 charge 0 >>>>>>> sref=0, backing_object(0)=(0)+0x0 >>>>>>> memory:=(off=0x0,page=0x8f0000),(off=0x1,page=0x8f1000),(off=0x2,page=0x8ee000),(off=0x3,page=0x8ef000),(off=0x4,page=0x8f3000),(off=0x5,page=0x8f4000) >>>>>>> ...(off=0x6,page=0x8fa000),(off=0x7,page=0x8fb000),(off=0x8,page=0x8fc000) >>>>>>> >>>>>>> If you need more, let me know what/how to get it, and I will... >>>>>>> >>>>>> Anyone who has seen the "wired page" panic, please try the attached >>>>>> patch. It introduces some new KASSERT()s that may help me to narrow >>>>>> down the problem. I haven't been able to trigger these KASSERT()s on >>>>>> amd64, but the symptoms that you guys are reporting are consistent with >>>>>> a bug that would trigger these KASSERT()s. >>>>> Ok, it triggered the xxx one: >>>>> Starting sendmail_msp_queue. >>>>> panic: vm_phys_free_contig: xxx >>>>> KDB: enter: panic >>>>> [ thread pid 782 tid 100051 ] >>>>> Stopped at kdb_enter+0x40: ldrb r15, [r15, r15, ror r15]! >>>>> db> bt >>>>> Tracing pid 782 tid 100051 td 0xc1470000 >>>>> db_trace_self() at db_trace_self >>>>> pc = 0xc0566ec8 lr = 0xc0566f54 (db_trace_thread+0x50) >>>>> sp = 0xcd830850 fp = 0xc03db694 >>>>> db_trace_thread() at db_trace_thread+0x50 >>>>> pc = 0xc0566f54 lr = 0xc022cd14 (db_command_init+0x620) >>>>> sp = 0xcd8308b0 fp = 0xc03db694 >>>>> db_command_init() at db_command_init+0x620 >>>>> pc = 0xc022cd14 lr = 0xc022c3ec (db_skip_to_eol+0x480) >>>>> sp = 0xcd8308c8 fp = 0xc03db694 >>>>> r4 = 0xc0683c30 r5 = 0x00000000 >>>>> db_skip_to_eol() at db_skip_to_eol+0x480 >>>>> pc = 0xc022c3ec lr = 0xc022c554 (db_command_loop+0x5c) >>>>> sp = 0xcd830968 fp = 0xc03db694 >>>>> r4 = 0xcd83097c r5 = 0xc0683efc >>>>> r6 = 0x00000000 r7 = 0x00000000 >>>>> r8 = 0x00000001 r10 = 0x600000d3 >>>>> db_command_loop() at db_command_loop+0x5c >>>>> pc = 0xc022c554 lr = 0xc022e99c (X_db_sym_numargs+0xec) >>>>> sp = 0xcd830970 fp = 0xc03db694 >>>>> X_db_sym_numargs() at X_db_sym_numargs+0xec >>>>> pc = 0xc022e99c lr = 0xc03db8c4 (kdb_trap+0x94) >>>>> sp = 0xcd830a88 fp = 0xc03db694 >>>>> r4 = 0x00000000 >>>>> kdb_trap() at kdb_trap+0x94 >>>>> pc = 0xc03db8c4 lr = 0xc0578eb0 (undefinedinstruction+0x2c8) >>>>> sp = 0xcd830aa8 fp = 0xc03db694 >>>>> r4 = 0x00000000 r5 = 0x00000000 >>>>> r6 = 0x00000000 r7 = 0xcd830b20 >>>>> r8 = 0xe7ffffff r10 = 0xe7ffffff >>>>> undefinedinstruction() at undefinedinstruction+0x2c8 >>>>> pc = 0xc0578eb0 lr = 0xc0568a0c (exception_exit) >>>>> sp = 0xcd830b20 fp = 0xc0613e70 >>>>> r4 = 0xffffffff r5 = 0xffff1004 >>>>> r6 = 0xc06d0ebc r7 = 0xcd830ba4 >>>>> r8 = 0xc1470000 r9 = 0x00000013 >>>>> r10 = 0x00000010 >>>>> exception_exit() at exception_exit >>>>> pc = 0xc0568a0c lr = 0xc03db68c (kdb_enter+0x38) >>>>> sp = 0xcd830b70 fp = 0xc0613e70 >>>>> r0 = 0x00000012 r1 = 0x60000013 >>>>> r2 = 0xc06df2ac r3 = 0xc06d0ee8 >>>>> r4 = 0xc05e5258 r5 = 0xc06155e8 >>>>> r6 = 0xc06d0ebc r7 = 0xcd830ba4 >>>>> r8 = 0xc1470000 r9 = 0x00000013 >>>>> r10 = 0x00000010 r12 = 0xc05e2518 >>>>> kdb_enter() at kdb_enter+0x44 >>>>> pc = 0xc03db698 lr = 0xc03aa094 (kern_reboot+0x948) >>>>> sp = 0xcd830b78 fp = 0xc0613e70 >>>>> r4 = 0x00000100 >>>>> kern_reboot() at kern_reboot+0x948 >>>>> pc = 0xc03aa094 lr = 0xc03aa164 (kassert_panic+0x68) >>>>> sp = 0xcd830b90 fp = 0xc0613e70 >>>>> r4 = 0xc06155e8 r5 = 0xc07e74a0 >>>>> r6 = 0xc07e6fa0 r7 = 0x00000004 >>>>> r8 = 0x00000010 >>>>> kassert_panic() at kassert_panic+0x68 >>>>> pc = 0xc03aa164 lr = 0xc055a0a8 (vm_phys_free_contig+0x8c) >>>>> sp = 0xcd830bb0 fp = 0xc0613e70 >>>>> r0 = 0xc06155e8 r1 = 0xc07e6d20 >>>>> r2 = 0xc06e6a70 r3 = 0x00000000 >>>>> r4 = 0xc07e73b0 >>>>> vm_phys_free_contig() at vm_phys_free_contig+0x8c >>>>> pc = 0xc055a0a8 lr = 0xc055ca70 (vm_reserv_startup+0x4bc) >>>>> sp = 0xcd830bd0 fp = 0xc0613e70 >>>>> r4 = 0xc08fb2cc r5 = 0x00000008 >>>>> r6 = 0x000000e8 r7 = 0xc08fb280 >>>>> r8 = 0x00000005 r10 = 0x00000001 >>>>> vm_reserv_startup() at vm_reserv_startup+0x4bc >>>>> pc = 0xc055ca70 lr = 0xc055cb40 (vm_reserv_startup+0x58c) >>>>> sp = 0xcd830be8 fp = 0xc0613e70 >>>>> r4 = 0xc08fb280 r5 = 0x00000000 >>>>> r6 = 0xc14b7280 r7 = 0x00000040 >>>>> r8 = 0x00000000 >>>>> vm_reserv_startup() at vm_reserv_startup+0x58c >>>>> pc = 0xc055cb40 lr = 0xc055ce08 (vm_reserv_reclaim_inactive+0x34) >>>>> sp = 0xcd830bf0 fp = 0xc0613e70 >>>>> r4 = 0xc06e6550 >>>>> vm_reserv_reclaim_inactive() at vm_reserv_reclaim_inactive+0x34 >>>>> pc = 0xc055ce08 lr = 0xc0554cb8 (vm_page_alloc+0x280) >>>>> sp = 0xcd830bf8 fp = 0xc0613e70 >>>>> vm_page_alloc() at vm_page_alloc+0x280 >>>>> pc = 0xc0554cb8 lr = 0xc0540eb0 (vm_fault_hold+0x60c) >>>>> sp = 0xcd830c30 fp = 0xcd830dac >>>>> r4 = 0xc14b7280 r5 = 0xc0618d00 >>>>> r6 = 0xcd830eb0 r7 = 0xc1470000 >>>>> r8 = 0xcd830e60 r9 = 0x00000000 >>>>> r10 = 0x00000000 >>>>> vm_fault_hold() at vm_fault_hold+0x60c >>>>> pc = 0xc0540eb0 lr = 0xc05426b8 (vm_fault+0x44) >>>>> sp = 0xcd830db0 fp = 0x00000002 >>>>> r4 = 0xc14c8a0c r5 = 0xc0618d00 >>>>> r6 = 0xcd830eb0 r7 = 0xc1470000 >>>>> r8 = 0xcd830e60 r9 = 0x00000001 >>>>> r10 = 0x00000000 >>>>> vm_fault() at vm_fault+0x44 >>>>> pc = 0xc05426b8 lr = 0xc05782d0 (data_abort_handler+0x35c) >>>>> sp = 0xcd830dc0 fp = 0x00000002 >>>>> data_abort_handler() at data_abort_handler+0x35c >>>>> pc = 0xc05782d0 lr = 0xc0568a0c (exception_exit) >>>>> sp = 0xcd830dc0 fp = 0x00000002 >>>>> data_abort_handler() at data_abort_handler+0x35c >>>>> pc = 0xc05782d0 lr = 0xc0568a0c (exception_exit) >>>>> sp = 0xcd830e60 fp = 0x20c43000 >>>>> r4 = 0xffffffff r5 = 0xffff1004 >>>>> r6 = 0x00000000 r7 = 0x20443740 >>>>> r8 = 0x0009b8e4 r9 = 0x00000001 >>>>> r10 = 0x00000004 >>>>> exception_exit() at exception_exit >>>>> pc = 0xc0568a0c lr = 0x204140d0 (0x204140d0) >>>>> sp = 0xcd830eb0 fp = 0x20c43000 >>>>> r0 = 0x00000000 r1 = 0x20c4302c >>>>> r2 = 0x00000004 r3 = 0x00000000 >>>>> r4 = 0x20446190 r5 = 0x20c4302c >>>>> r6 = 0x00000000 r7 = 0x20443740 >>>>> r8 = 0x0009b8e4 r9 = 0x00000001 >>>>> r10 = 0x00000004 r12 = 0x00000001 >>>>> Unable to unwind into user mode >>>>> >>>>> Hope this helps, let me know if you need anything else... >>>>> >>>> Please try the attached patch. It adds another KASSERT() loop. >>>> >>>> Depending on which KASSERT() fires, that will tell us whether to look >>>> deeper at this function or its caller for the source of the problem. >>> Ok, that panic is: >>> panic: vm_phys_free_contig: start 0xc07e6d20 21 24 >>> >>> Let me know if you need any more info... oh, btw, the last %u needed >>> to be %lu since it was a u_long, not an unsigned... >>> >> Ok. Here is the next debug patch. > so, it's crashing in the same place: > panic: vm_phys_free_contig: start 0xc07e6d20 21 24 > > so, I commented out this KASSERT, and now it panics with: > panic: vm_phys_free_contig: xxx 0xc07e6fa0 13 16 > > so I commented out this KASSERT too, and it panics back w/ the original > panic.. So it didn't hit the new KASSERT in vm_reserv_break... > Next patch...It should panic in vm_reserv_break this time and tell me if the reservation being broken belongs to the same object as the inuse page that is being inappropriately freed. --------------030200060505090800030104 Content-Type: text/plain; charset=ISO-8859-15; name="arm_debug4.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="arm_debug4.patch" Index: vm/vm_phys.c =================================================================== --- vm/vm_phys.c (revision 267209) +++ vm/vm_phys.c (working copy) @@ -693,9 +693,16 @@ vm_phys_free_pages(vm_page_t m, int order) void vm_phys_free_contig(vm_page_t m, u_long npages) { + vm_page_t m_tmp; u_int n; int order; + for (m_tmp = m; m_tmp < &m[npages]; m_tmp++) + KASSERT(m_tmp->object == NULL || + (m_tmp->flags & PG_CACHED) != 0, + ("vm_phys_free_contig: start %p %td %lu", + m, m_tmp - m, npages)); + /* * Avoid unnecessary coalescing by freeing the pages in the largest * possible power-of-two-sized subsets. @@ -714,6 +721,11 @@ vm_phys_free_contig(vm_page_t m, u_long npages) n = 1 << order; if (npages < n) break; + for (m_tmp = m; m_tmp < &m[n]; m_tmp++) + KASSERT(m_tmp->object == NULL || + (m_tmp->flags & PG_CACHED) != 0, + ("vm_phys_free_contig: xxx %p %td %u", + m, m_tmp - m, n)); vm_phys_free_pages(m, order); m += n; } @@ -721,6 +733,11 @@ vm_phys_free_contig(vm_page_t m, u_long npages) for (; npages > 0; npages -= n) { order = flsl(npages) - 1; n = 1 << order; + for (m_tmp = m; m_tmp < &m[n]; m_tmp++) + KASSERT(m_tmp->object == NULL || + (m_tmp->flags & PG_CACHED) != 0, + ("vm_phys_free_contig: yyy %p %td %u", + m, m_tmp - m, n)); vm_phys_free_pages(m, order); m += n; } Index: vm/vm_reserv.c =================================================================== --- vm/vm_reserv.c (revision 267213) +++ vm/vm_reserv.c (working copy) @@ -646,7 +646,8 @@ found: static void vm_reserv_break(vm_reserv_t rv, vm_page_t m) { - int begin_zeroes, hi, i, lo; + int begin_zeroes, hi, i, lo, x; + vm_object_t saved_object; mtx_assert(&vm_page_queue_free_mtx, MA_OWNED); KASSERT(rv->object != NULL, @@ -653,6 +654,7 @@ vm_reserv_break(vm_reserv_t rv, vm_page_t m) ("vm_reserv_break: reserv %p is free", rv)); KASSERT(!rv->inpartpopq, ("vm_reserv_break: reserv %p's inpartpopq is TRUE", rv)); + saved_object = rv->object; LIST_REMOVE(rv, objq); rv->object = NULL; if (m != NULL) { @@ -703,6 +705,19 @@ vm_reserv_break(vm_reserv_t rv, vm_page_t m) if (i != NPOPMAP) /* Convert from ffsl() to ordinary bit numbering. */ hi--; + for (x = begin_zeroes; x < NBPOPMAP * i + hi - begin_zeroes; + x++) { + KASSERT(isclr(rv->popmap, x), + ("vm_reserv_break: reserv %p popmap[%d]", rv, x)); + } + for (x = begin_zeroes; x < NBPOPMAP * i + hi - begin_zeroes; + x++) { + vm_page_t m_tmp = &rv->pages[x]; + KASSERT(m_tmp->object == NULL || + (m_tmp->flags & PG_CACHED) != 0, + ("vm_reservs_break: saved_object=%p x=%d m_tmp->object=%p (%d)", + saved_object, x, m_tmp->object, m_tmp->object == kmem_object)); + } vm_phys_free_contig(&rv->pages[begin_zeroes], NBPOPMAP * i + hi - begin_zeroes); } while (i < NPOPMAP); --------------030200060505090800030104--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5395D312.5000302>