Date: Tue, 05 Jul 2011 13:12:48 -0500 From: Alan Cox <alc@rice.edu> To: Marius Strobl <marius@alchemy.franken.de> Cc: Peter Jeremy <peter.jeremy@alcatel-lucent.com>, "alc@freebsd.org" <alc@freebsd.org>, freebsd-sparc64@freebsd.org Subject: Re: 'make -j16 universe' gives SIReset Message-ID: <4E135420.4080201@rice.edu> In-Reply-To: <20110705160709.GA77843@alchemy.franken.de> References: <20110619220033.GA61397@server.vk2pj.dyndns.org> <20110622100524.GO14797@alchemy.franken.de> <20110629025433.GA48145@server.vk2pj.dyndns.org> <20110629175444.GH14797@alchemy.franken.de> <20110629220010.GA53017@pjdesk.au.alcatel-lucent.com> <20110629223008.GL14797@alchemy.franken.de> <20110630221752.GG65891@pjdesk.au.alcatel-lucent.com> <20110702002325.GS14797@alchemy.franken.de> <4E0F6B8D.8000500@rice.edu> <20110704214158.GX14797@alchemy.franken.de> <20110705160709.GA77843@alchemy.franken.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On 07/05/2011 11:07, Marius Strobl wrote: > On Mon, Jul 04, 2011 at 11:41:58PM +0200, Marius Strobl wrote: >> On Sat, Jul 02, 2011 at 02:03:41PM -0500, Alan Cox wrote: >>> On 07/01/2011 19:23, Marius Strobl wrote: >>>> On Fri, Jul 01, 2011 at 08:17:52AM +1000, Peter Jeremy wrote: >>>>> [Moving back on-list] >>>>> >>>>> On 2011-Jun-30 06:30:08 +0800, Marius Strobl<marius@alchemy.franken.de> >>>>> wrote: >>>>>> On Thu, Jun 30, 2011 at 08:00:10AM +1000, Peter Jeremy wrote: >>>>>>> On 2011-Jun-29 19:54:44 +0200, Marius Strobl<marius@alchemy.franken.de> >>>>>>> wrote: >>>>>>>> On Wed, Jun 29, 2011 at 12:54:33PM +1000, Peter Jeremy wrote: >>>>>>>>> My V890 has been running "make -j32 buildworld" in a loop for a >>>>>>>>> week now without problems so I think that was the problem. >>>>>>> OTOH, a V440 that has been running similar load for a similar period >>>>>>> died overnight with: >>>>>>> >>>>>>> panic: uma_small_alloc: free page still has mappings! >>>>>>> VNASSERT failed >>>>>>> cpuid = 3 >>>>>>> 0xfffff800079643c0: KDB: enter: panic >>>>> ... >>>>>>> I'm fairly sure that is the same kernel but will double-check and >>>>>>> investigate that panic further. >>>>> FWIW, that kernel didn't have the latest patchset (adding Zeus support). >>>> That shouldn't make a difference; the later version only adds the >>>> SPARC64 bits as you already noticed and adjusts the boot loader to >>>> compile again. I made no changes to the existing parts apart from >>>> fixing a comment. Besides I see no connection between fixing the >>>> gross user TLB flushing and the below problem so far. >>>> >>>>>> Ok, this appears to be an unrelated problem though. Alan, do you >>>>>> have an idea what could be causing this? >>>>> I managed to get the same panic (though different traceback) on the >>>>> V890 after about an hour of pho@'s stress test with INCARNATIONS=150: >>>>> >>>>> panic: uma_small_alloc: free page still has mappings! >>>>> cpuid = 1 >>>>> KDB: enter: panic >>>>> [ thread pid 142 tid 100196 ] >>>>> Stopped at kdb_enter+0x80: ta %xcc, 1 >>>>> db> where >>>>> Tracing pid 142 tid 100196 td 0xfffff8a016ace880 >>>>> panic() at panic+0x20c >>>>> uma_small_alloc() at uma_small_alloc+0xe8 >>>>> keg_alloc_slab() at keg_alloc_slab+0xc8 >>>>> keg_fetch_slab() at keg_fetch_slab+0x218 >>>>> zone_fetch_slab() at zone_fetch_slab+0x44 >>>>> uma_zalloc_arg() at uma_zalloc_arg+0x60c >>>>> m_getm2() at m_getm2+0x134 >>>>> m_uiotombuf() at m_uiotombuf+0x4c >>>>> sosend_generic() at sosend_generic+0x420 >>>>> sosend() at sosend+0x2c >>>>> soo_write() at soo_write+0x3c >>>>> dofilewrite() at dofilewrite+0x7c >>>>> kern_writev() at kern_writev+0x38 >>>>> write() at write+0x4c >>>>> syscallenter() at syscallenter+0x270 >>>>> syscall() at syscall+0x74 >>>>> -- syscall (4, FreeBSD ELF64, write) %o7=0x101db4 -- >>>>> userland() at 0x405936c8 >>>>> user trace: trap %o7=0x101db4 >>>>> pc 0x405936c8, sp 0x7fdffffd8a1 >>>>> pc 0x101f44, sp 0x7fdffffd9a1 >>>>> pc 0x104604, sp 0x7fdffffda81 >>>>> pc 0x1046f0, sp 0x7fdffffdb51 >>>>> pc 0x104994, sp 0x7fdffffdc21 >>>>> pc 0x104d90, sp 0x7fdffffdd01 >>>>> pc 0x101610, sp 0x7fdffffde41 >>>>> pc 0x4020cff4, sp 0x7fdffffdf01 >>>>> done >>>>> db> >>>>> >>>>> I've got a crashdump on the V440 but discovered that gdb reports >>>>> "GDB can't read core files on this machine." so it isn't much use. >>>>> Any suggestions on how to debug this? >>>> The VM and its interaction with the MD code are beyond me, I hope >>>> Alan can chime in here. Reading through the code I see a possible >>>> path which could lead to this though; tsb_tte_enter(), which is >>>> the only place where TD_PV ever is set and also only in case of >>>> managed pages, always calls pmap_cache_enter(), which together >>>> with pmap_cache_remove() does the page color handling. In >>>> pmap_remove_all() however, pmap_cache_remove() is only called for >>>> managed pages, so for unmanaged pages we might miss the removal >>>> of the mapping from the the color used. I've no idea though if >>>> this actually is relevant, i.e. whether the VM ever calls >>>> pmap_remove_all() for unmanaged pages. >>> In HEAD, it does not. Other architectures have an assertion forbidding >>> pmap_remove_all() calls on unmanaged pages. (Btw, I'm happy to add this >>> assertion to sparc64's pmap if you like.) In older versions, calling >>> pmap_remove_all() on unmanaged pages is expected to be a harmless NOP >>> that's just a waste of cycles. >>> >>> With unmanaged pages, it is expected that pmap_remove() is used to >>> destroy mappings before the page is freed. >>> >>> For years, vm_page_free{,_toq}() has asserted that the page has no >>> managed mappings: >>> >>> if ((m->flags& PG_UNMANAGED) == 0) { >>> vm_page_lock_assert(m, MA_OWNED); >>> KASSERT(!pmap_page_is_mapped(m), >>> ("vm_page_free_toq: freeing mapped page %p", m)); >>> } >>> >> Okay, then my theories don't hold. >> >>> As a debugging aid, you might want to add an additional check here on >>> colors. >> I did that and it turns out to trigger rather quickly: >> Trying to mount root from nfs: []... >> NFS ROOT: 192.168.1.40:/usr/data/nfsroot/sparc64 >> dc1: link state changed to UP >> panic: vm_page_free_toq: free page 0xfffff80047b8a088 still has mappings! >> cpuid = 0 >> KDB: enter: panic >> [ thread pid 1 tid 100001 ] >> Stopped at kdb_enter+0x80: ta %xcc, 1 >> db> bt >> Tracing pid 1 tid 100001 td 0xfffff80041094000 >> panic() at panic+0x20c >> vm_page_free_toq() at vm_page_free_toq+0xb4 >> vm_page_free_zero() at vm_page_free_zero+0x10 >> pmap_release() at pmap_release+0x170 >> vmspace_free() at vmspace_free+0x70 >> vmspace_exec() at vmspace_exec+0x48 >> exec_new_vmspace() at exec_new_vmspace+0x240 >> exec_elf64_imgact() at exec_elf64_imgact+0x598 >> kern_execve() at kern_execve+0x398 >> execve() at execve+0x34 >> start_init() at start_init+0x2ec >> fork_exit() at fork_exit+0x9c >> fork_trampoline() at fork_trampoline+0x8 >> db> >> >> Further debugging shows that the page in question is one of the TSB >> pages entered by pmap_pinit(). In pmap_release() vm_page_free_zero() >> is called on these before pmap_qremove(), so there appears to be a >> race in which these pages can get re-used before their mappings are >> removed. I suspect that this might be related to your change in >> r207648, but just reverting that one nowadays this triggers the >> assertion in vm_page_free_toq() about the page lock not being held. >> Anyway, I'm not sure what the right fix for this is; should >> pmap_release() call pmap_qremove() on these pages one-by-one before >> calling vm_page_free_zero() or maybe just call pmap_qremove() for >> all of them before looping over them and calling vm_page_free_zero()? >> > Well, given that all uses of pmap_qremove() in the kernel except > the one in the sparc64 pmap_release and two invocations in vfs_bio.c > remove the pages before they are freed, unwired etc this seems to be > a safe thing to do. Does the below patch look correct to you? > Basically, yes. However, I would suggest adding the KASSERT in pmap.c as a separate change. The pmap_qremove() changes should be MFCed to RELENG_8 and RELENG_7, but not the KASSERT change. > Index: kern/vfs_bio.c > =================================================================== > --- kern/vfs_bio.c (revision 223705) > +++ kern/vfs_bio.c (working copy) > @@ -1625,6 +1625,7 @@ vfs_vmio_release(struct buf *bp) > int i; > vm_page_t m; > > + pmap_qremove(trunc_page((vm_offset_t) bp->b_data), bp->b_npages); While you're here, please also remove the non-style(9) compliant space after the cast. > VM_OBJECT_LOCK(bp->b_bufobj->bo_object); > for (i = 0; i< bp->b_npages; i++) { > m = bp->b_pages[i]; > @@ -1658,7 +1659,6 @@ vfs_vmio_release(struct buf *bp) > vm_page_unlock(m); > } > VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object); > - pmap_qremove(trunc_page((vm_offset_t) bp->b_data), bp->b_npages); > > if (bp->b_bufsize) { > bufspacewakeup(); > @@ -3012,6 +3012,10 @@ allocbuf(struct buf *bp, int size) > if (desiredpages< bp->b_npages) { > vm_page_t m; > > + pmap_qremove((vm_offset_t)trunc_page( > + (vm_offset_t)bp->b_data) + > + (desiredpages<< PAGE_SHIFT), > + (bp->b_npages - desiredpages)); > VM_OBJECT_LOCK(bp->b_bufobj->bo_object); > for (i = desiredpages; i< bp->b_npages; i++) { > /* > @@ -3032,8 +3036,6 @@ allocbuf(struct buf *bp, int size) > vm_page_unlock(m); > } > VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object); > - pmap_qremove((vm_offset_t) trunc_page((vm_offset_t)bp->b_data) + > - (desiredpages<< PAGE_SHIFT), (bp->b_npages - desiredpages)); > bp->b_npages = desiredpages; > } > } else if (size> bp->b_bcount) { > Index: sparc64/sparc64/pmap.c > =================================================================== > --- sparc64/sparc64/pmap.c (revision 223705) > +++ sparc64/sparc64/pmap.c (working copy) > @@ -1286,6 +1289,7 @@ pmap_release(pmap_t pm) > pc->pc_pmap = NULL; > mtx_unlock_spin(&sched_lock); > > + pmap_qremove((vm_offset_t)pm->pm_tsb, TSB_PAGES); > obj = pm->pm_tsb_obj; > VM_OBJECT_LOCK(obj); > KASSERT(obj->ref_count == 1, ("pmap_release: tsbobj ref count != 1")); > @@ -1297,7 +1301,6 @@ pmap_release(pmap_t pm) > vm_page_free_zero(m); > } > VM_OBJECT_UNLOCK(obj); > - pmap_qremove((vm_offset_t)pm->pm_tsb, TSB_PAGES); > PMAP_LOCK_DESTROY(pm); > } > > @@ -1379,6 +1382,8 @@ pmap_remove_all(vm_page_t m) > struct tte *tp; > vm_offset_t va; > > + KASSERT((m->flags& (PG_FICTITIOUS | PG_UNMANAGED)) == 0, > + ("pmap_remove_all: page %p is not managed", m)); > vm_page_lock_queues(); > for (tp = TAILQ_FIRST(&m->md.tte_list); tp != NULL; tp = tpn) { > tpn = TAILQ_NEXT(tp, tte_link); >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E135420.4080201>