From owner-freebsd-sparc64@FreeBSD.ORG Wed Jun 15 23:34:50 2011 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DFC721065670 for ; Wed, 15 Jun 2011 23:34:50 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 7A2298FC08 for ; Wed, 15 Jun 2011 23:34:49 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id p5FNYjhI093521; Thu, 16 Jun 2011 01:34:45 +0200 (CEST) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.4/8.14.4/Submit) id p5FNYjWt093520; Thu, 16 Jun 2011 01:34:45 +0200 (CEST) (envelope-from marius) Date: Thu, 16 Jun 2011 01:34:45 +0200 From: Marius Strobl To: Peter Jeremy Message-ID: <20110615233445.GZ7064@alchemy.franken.de> References: <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de> <20110601231237.GA5267@server.vk2pj.dyndns.org> <20110608224801.GB35494@alchemy.franken.de> <20110613235144.GA12470@server.vk2pj.dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110613235144.GA12470@server.vk2pj.dyndns.org> User-Agent: Mutt/1.4.2.3i Cc: freebsd-sparc64@freebsd.org Subject: Re: 'make -j16 universe' gives SIReset X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 23:34:51 -0000 On Tue, Jun 14, 2011 at 09:51:44AM +1000, Peter Jeremy wrote: > On 2011-Jun-09 00:48:01 +0200, Marius Strobl wrote: > >This might be due to the excessive use of sched_lock by SCHED_4BSD > >and the MD code, f.e. more CPUs means less TLB contexts per CPU which > >in turn means more flushes that are protect by sched_lock. > > I have noticed that systat reports very high trap & fault counts. That's basically expected; on USIII and later FreeBSD just flushes all unlocked TLB entries when we need to flush the userland mappings and accept TLB misses for the kernel ones instead of traversing the TLBs for userland mappings and removing just those. Actually OpenSolaris just does the same thing and IIRC there actually isn't a way to traverse the large TLBs. Given that the TLB contexts are divided evenly among the cores this means the more flushes and misses the more cores are in the machine. Previously FreeBSD shared the contexts which meant TLB shootdown IPIs even for non-shared PMAPs. So the question is whether there's some point at which that approach actually costs less performance than accepting TLB misses. This seems unlikely though and AFAIK the current approach actually is inspired by Solaris Internals. > > I got a "spinlock held too long" panic that should have gone to DDB > but the system wouldn't respond to anything other than a RSC reset. > You could try whether the below patch sufficiently reduces the lock coverage to avoid these. For stable/8 you'll probably need to apply the second chunk by hand. Marius Index: pmap.c =================================================================== --- pmap.c (revision 223042) +++ pmap.c (working copy) @@ -2217,11 +2217,10 @@ pmap_activate(struct thread *td) struct pmap *pm; int context; + critical_enter(); vm = td->td_proc->p_vmspace; pm = vmspace_pmap(vm); - mtx_lock_spin(&sched_lock); - context = PCPU_GET(tlb_ctx); if (context == PCPU_GET(tlb_ctx_max)) { tlb_flush_user(); @@ -2229,17 +2228,18 @@ pmap_activate(struct thread *td) } PCPU_SET(tlb_ctx, context + 1); + mtx_lock_spin(&sched_lock); pm->pm_context[curcpu] = context; CPU_OR(&pm->pm_active, PCPU_PTR(cpumask)); PCPU_SET(pmap, pm); + mtx_unlock_spin(&sched_lock); stxa(AA_DMMU_TSB, ASI_DMMU, pm->pm_tsb); stxa(AA_IMMU_TSB, ASI_IMMU, pm->pm_tsb); stxa(AA_DMMU_PCXR, ASI_DMMU, (ldxa(AA_DMMU_PCXR, ASI_DMMU) & TLB_CXR_PGSZ_MASK) | context); flush(KERNBASE); - - mtx_unlock_spin(&sched_lock); + critical_exit(); } void