From owner-freebsd-sparc64@FreeBSD.ORG  Tue Jul  5 18:12:51 2011
Return-Path: <owner-freebsd-sparc64@FreeBSD.ORG>
Delivered-To: freebsd-sparc64@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6FA65106566B;
	Tue,  5 Jul 2011 18:12:51 +0000 (UTC) (envelope-from alc@rice.edu)
Received: from mh5.mail.rice.edu (mh5.mail.rice.edu [128.42.199.32])
	by mx1.freebsd.org (Postfix) with ESMTP id 356608FC12;
	Tue,  5 Jul 2011 18:12:51 +0000 (UTC)
Received: from mh5.mail.rice.edu (localhost.localdomain [127.0.0.1])
	by mh5.mail.rice.edu (Postfix) with ESMTP id 67FD92900EB;
	Tue,  5 Jul 2011 13:12:50 -0500 (CDT)
X-Virus-Scanned: by amavis-2.6.4 at mh5.mail.rice.edu, auth channel
Received: from mh5.mail.rice.edu ([127.0.0.1])
	by mh5.mail.rice.edu (mh5.mail.rice.edu [127.0.0.1]) (amavis,
	port 10026)
	with ESMTP id 8PlT4woJldUC; Tue,  5 Jul 2011 13:12:50 -0500 (CDT)
Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net
	(adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18])
	(using TLSv1 with cipher RC4-MD5 (128/128 bits))
	(No client certificate requested) (Authenticated sender: alc)
	by mh5.mail.rice.edu (Postfix) with ESMTPSA id A7D952900B4;
	Tue,  5 Jul 2011 13:12:49 -0500 (CDT)
Message-ID: <4E135420.4080201@rice.edu>
Date: Tue, 05 Jul 2011 13:12:48 -0500
From: Alan Cox <alc@rice.edu>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US;
	rv:1.9.2.17) Gecko/20110620 Thunderbird/3.1.10
MIME-Version: 1.0
To: Marius Strobl <marius@alchemy.franken.de>
References: <20110619220033.GA61397@server.vk2pj.dyndns.org>
	<20110622100524.GO14797@alchemy.franken.de>
	<20110629025433.GA48145@server.vk2pj.dyndns.org>
	<20110629175444.GH14797@alchemy.franken.de>
	<20110629220010.GA53017@pjdesk.au.alcatel-lucent.com>
	<20110629223008.GL14797@alchemy.franken.de>
	<20110630221752.GG65891@pjdesk.au.alcatel-lucent.com>
	<20110702002325.GS14797@alchemy.franken.de>
	<4E0F6B8D.8000500@rice.edu>
	<20110704214158.GX14797@alchemy.franken.de>
	<20110705160709.GA77843@alchemy.franken.de>
In-Reply-To: <20110705160709.GA77843@alchemy.franken.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Peter Jeremy <peter.jeremy@alcatel-lucent.com>,
	"alc@freebsd.org" <alc@freebsd.org>, freebsd-sparc64@freebsd.org
Subject: Re: 'make -j16 universe' gives SIReset
X-BeenThere: freebsd-sparc64@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting FreeBSD to the Sparc <freebsd-sparc64.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64>, 
	<mailto:freebsd-sparc64-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-sparc64>
List-Post: <mailto:freebsd-sparc64@freebsd.org>
List-Help: <mailto:freebsd-sparc64-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64>,
	<mailto:freebsd-sparc64-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2011 18:12:51 -0000

On 07/05/2011 11:07, Marius Strobl wrote:
> On Mon, Jul 04, 2011 at 11:41:58PM +0200, Marius Strobl wrote:
>> On Sat, Jul 02, 2011 at 02:03:41PM -0500, Alan Cox wrote:
>>> On 07/01/2011 19:23, Marius Strobl wrote:
>>>> On Fri, Jul 01, 2011 at 08:17:52AM +1000, Peter Jeremy wrote:
>>>>> [Moving back on-list]
>>>>>
>>>>> On 2011-Jun-30 06:30:08 +0800, Marius Strobl<marius@alchemy.franken.de>
>>>>> wrote:
>>>>>> On Thu, Jun 30, 2011 at 08:00:10AM +1000, Peter Jeremy wrote:
>>>>>>> On 2011-Jun-29 19:54:44 +0200, Marius Strobl<marius@alchemy.franken.de>
>>>>>>> wrote:
>>>>>>>> On Wed, Jun 29, 2011 at 12:54:33PM +1000, Peter Jeremy wrote:
>>>>>>>>> My V890 has been running "make -j32 buildworld" in a loop for a
>>>>>>>>> week now without problems so I think that was the problem.
>>>>>>> OTOH, a V440 that has been running similar load for a similar period
>>>>>>> died overnight with:
>>>>>>>
>>>>>>> panic: uma_small_alloc: free page still has mappings!
>>>>>>> VNASSERT failed
>>>>>>> cpuid = 3
>>>>>>> 0xfffff800079643c0: KDB: enter: panic
>>>>> ...
>>>>>>> I'm fairly sure that is the same kernel but will double-check and
>>>>>>> investigate that panic further.
>>>>> FWIW, that kernel didn't have the latest patchset (adding Zeus support).
>>>> That shouldn't make a difference; the later version only adds the
>>>> SPARC64 bits as you already noticed and adjusts the boot loader to
>>>> compile again. I made no changes to the existing parts apart from
>>>> fixing a comment. Besides I see no connection between fixing the
>>>> gross user TLB flushing and the below problem so far.
>>>>
>>>>>> Ok, this appears to be an unrelated problem though. Alan, do you
>>>>>> have an idea what could be causing this?
>>>>> I managed to get the same panic (though different traceback) on the
>>>>> V890 after about an hour of pho@'s stress test with INCARNATIONS=150:
>>>>>
>>>>> panic: uma_small_alloc: free page still has mappings!
>>>>> cpuid = 1
>>>>> KDB: enter: panic
>>>>> [ thread pid 142 tid 100196 ]
>>>>> Stopped at      kdb_enter+0x80: ta              %xcc, 1
>>>>> db>   where
>>>>> Tracing pid 142 tid 100196 td 0xfffff8a016ace880
>>>>> panic() at panic+0x20c
>>>>> uma_small_alloc() at uma_small_alloc+0xe8
>>>>> keg_alloc_slab() at keg_alloc_slab+0xc8
>>>>> keg_fetch_slab() at keg_fetch_slab+0x218
>>>>> zone_fetch_slab() at zone_fetch_slab+0x44
>>>>> uma_zalloc_arg() at uma_zalloc_arg+0x60c
>>>>> m_getm2() at m_getm2+0x134
>>>>> m_uiotombuf() at m_uiotombuf+0x4c
>>>>> sosend_generic() at sosend_generic+0x420
>>>>> sosend() at sosend+0x2c
>>>>> soo_write() at soo_write+0x3c
>>>>> dofilewrite() at dofilewrite+0x7c
>>>>> kern_writev() at kern_writev+0x38
>>>>> write() at write+0x4c
>>>>> syscallenter() at syscallenter+0x270
>>>>> syscall() at syscall+0x74
>>>>> -- syscall (4, FreeBSD ELF64, write) %o7=0x101db4 --
>>>>> userland() at 0x405936c8
>>>>> user trace: trap %o7=0x101db4
>>>>> pc 0x405936c8, sp 0x7fdffffd8a1
>>>>> pc 0x101f44, sp 0x7fdffffd9a1
>>>>> pc 0x104604, sp 0x7fdffffda81
>>>>> pc 0x1046f0, sp 0x7fdffffdb51
>>>>> pc 0x104994, sp 0x7fdffffdc21
>>>>> pc 0x104d90, sp 0x7fdffffdd01
>>>>> pc 0x101610, sp 0x7fdffffde41
>>>>> pc 0x4020cff4, sp 0x7fdffffdf01
>>>>> done
>>>>> db>
>>>>>
>>>>> I've got a crashdump on the V440 but discovered that gdb reports
>>>>> "GDB can't read core files on this machine." so it isn't much use.
>>>>> Any suggestions on how to debug this?
>>>> The VM and its interaction with the MD code are beyond me, I hope
>>>> Alan can chime in here. Reading through the code I see a possible
>>>> path which could lead to this though; tsb_tte_enter(), which is
>>>> the only place where TD_PV ever is set and also only in case of
>>>> managed pages, always calls pmap_cache_enter(), which together
>>>> with pmap_cache_remove() does the page color handling. In
>>>> pmap_remove_all() however, pmap_cache_remove() is only called for
>>>> managed pages, so for unmanaged pages we might miss the removal
>>>> of the mapping from the the color used. I've no idea though if
>>>> this actually is relevant, i.e. whether the VM ever calls
>>>> pmap_remove_all() for unmanaged pages.
>>> In HEAD, it does not.  Other architectures have an assertion forbidding
>>> pmap_remove_all() calls on unmanaged pages.  (Btw, I'm happy to add this
>>> assertion to sparc64's pmap if you like.)  In older versions, calling
>>> pmap_remove_all() on unmanaged pages is expected to be a harmless NOP
>>> that's just a waste of cycles.
>>>
>>> With unmanaged pages, it is expected that pmap_remove() is used to
>>> destroy mappings before the page is freed.
>>>
>>> For years, vm_page_free{,_toq}() has asserted that the page has no
>>> managed mappings:
>>>
>>>          if ((m->flags&  PG_UNMANAGED) == 0) {
>>>                  vm_page_lock_assert(m, MA_OWNED);
>>>                  KASSERT(!pmap_page_is_mapped(m),
>>>                      ("vm_page_free_toq: freeing mapped page %p", m));
>>>          }
>>>
>> Okay, then my theories don't hold.
>>
>>> As a debugging aid, you might want to add an additional check here on
>>> colors.
>> I did that and it turns out to trigger rather quickly:
>> Trying to mount root from nfs: []...
>> NFS ROOT: 192.168.1.40:/usr/data/nfsroot/sparc64
>> dc1: link state changed to UP
>> panic: vm_page_free_toq: free page 0xfffff80047b8a088 still has mappings!
>> cpuid = 0
>> KDB: enter: panic
>> [ thread pid 1 tid 100001 ]
>> Stopped at      kdb_enter+0x80: ta              %xcc, 1
>> db>  bt
>> Tracing pid 1 tid 100001 td 0xfffff80041094000
>> panic() at panic+0x20c
>> vm_page_free_toq() at vm_page_free_toq+0xb4
>> vm_page_free_zero() at vm_page_free_zero+0x10
>> pmap_release() at pmap_release+0x170
>> vmspace_free() at vmspace_free+0x70
>> vmspace_exec() at vmspace_exec+0x48
>> exec_new_vmspace() at exec_new_vmspace+0x240
>> exec_elf64_imgact() at exec_elf64_imgact+0x598
>> kern_execve() at kern_execve+0x398
>> execve() at execve+0x34
>> start_init() at start_init+0x2ec
>> fork_exit() at fork_exit+0x9c
>> fork_trampoline() at fork_trampoline+0x8
>> db>
>>
>> Further debugging shows that the page in question is one of the TSB
>> pages entered by pmap_pinit(). In pmap_release() vm_page_free_zero()
>> is called on these before pmap_qremove(), so there appears to be a
>> race in which these pages can get re-used before their mappings are
>> removed. I suspect that this might be related to your change in
>> r207648, but just reverting that one nowadays this triggers the
>> assertion in vm_page_free_toq() about the page lock not being held.
>> Anyway, I'm not sure what the right fix for this is; should
>> pmap_release() call pmap_qremove() on these pages one-by-one before
>> calling vm_page_free_zero() or maybe just call pmap_qremove() for
>> all of them before looping over them and calling vm_page_free_zero()?
>>
> Well, given that all uses of pmap_qremove() in the kernel except
> the one in the sparc64 pmap_release and two invocations in vfs_bio.c
> remove the pages before they are freed, unwired etc this seems to be
> a safe thing to do. Does the below patch look correct to you?
>

Basically, yes.  However, I would suggest adding the KASSERT in pmap.c 
as a separate change.  The pmap_qremove() changes should be MFCed to 
RELENG_8 and RELENG_7, but not the KASSERT change.

> Index: kern/vfs_bio.c
> ===================================================================
> --- kern/vfs_bio.c	(revision 223705)
> +++ kern/vfs_bio.c	(working copy)
> @@ -1625,6 +1625,7 @@ vfs_vmio_release(struct buf *bp)
>   	int i;
>   	vm_page_t m;
>
> +	pmap_qremove(trunc_page((vm_offset_t) bp->b_data), bp->b_npages);

While you're here, please also remove the non-style(9) compliant space 
after the cast.

>   	VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
>   	for (i = 0; i<  bp->b_npages; i++) {
>   		m = bp->b_pages[i];
> @@ -1658,7 +1659,6 @@ vfs_vmio_release(struct buf *bp)
>   		vm_page_unlock(m);
>   	}
>   	VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
> -	pmap_qremove(trunc_page((vm_offset_t) bp->b_data), bp->b_npages);
>   	
>   	if (bp->b_bufsize) {
>   		bufspacewakeup();
> @@ -3012,6 +3012,10 @@ allocbuf(struct buf *bp, int size)
>   			if (desiredpages<  bp->b_npages) {
>   				vm_page_t m;
>
> +				pmap_qremove((vm_offset_t)trunc_page(
> +				    (vm_offset_t)bp->b_data) +
> +				    (desiredpages<<  PAGE_SHIFT),
> +				    (bp->b_npages - desiredpages));
>   				VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
>   				for (i = desiredpages; i<  bp->b_npages; i++) {
>   					/*
> @@ -3032,8 +3036,6 @@ allocbuf(struct buf *bp, int size)
>   					vm_page_unlock(m);
>   				}
>   				VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
> -				pmap_qremove((vm_offset_t) trunc_page((vm_offset_t)bp->b_data) +
> -				    (desiredpages<<  PAGE_SHIFT), (bp->b_npages - desiredpages));
>   				bp->b_npages = desiredpages;
>   			}
>   		} else if (size>  bp->b_bcount) {
> Index: sparc64/sparc64/pmap.c
> ===================================================================
> --- sparc64/sparc64/pmap.c	(revision 223705)
> +++ sparc64/sparc64/pmap.c	(working copy)
> @@ -1286,6 +1289,7 @@ pmap_release(pmap_t pm)
>   			pc->pc_pmap = NULL;
>   	mtx_unlock_spin(&sched_lock);
>
> +	pmap_qremove((vm_offset_t)pm->pm_tsb, TSB_PAGES);
>   	obj = pm->pm_tsb_obj;
>   	VM_OBJECT_LOCK(obj);
>   	KASSERT(obj->ref_count == 1, ("pmap_release: tsbobj ref count != 1"));
> @@ -1297,7 +1301,6 @@ pmap_release(pmap_t pm)
>   		vm_page_free_zero(m);
>   	}
>   	VM_OBJECT_UNLOCK(obj);
> -	pmap_qremove((vm_offset_t)pm->pm_tsb, TSB_PAGES);
>   	PMAP_LOCK_DESTROY(pm);
>   }
>
> @@ -1379,6 +1382,8 @@ pmap_remove_all(vm_page_t m)
>   	struct tte *tp;
>   	vm_offset_t va;
>
> +	KASSERT((m->flags&  (PG_FICTITIOUS | PG_UNMANAGED)) == 0,
> +	    ("pmap_remove_all: page %p is not managed", m));
>   	vm_page_lock_queues();
>   	for (tp = TAILQ_FIRST(&m->md.tte_list); tp != NULL; tp = tpn) {
>   		tpn = TAILQ_NEXT(tp, tte_link);
>