Date: Wed, 17 Aug 2011 11:45:41 +0200 From: Marius Strobl <marius@alchemy.franken.de> To: Peter Jeremy <peterjeremy@acm.org> Cc: freebsd-sparc64@freebsd.org Subject: Re: 'make -j16 universe' gives SIReset Message-ID: <20110817094541.GJ48988@alchemy.franken.de> In-Reply-To: <20110816214820.GA35017@server.vk2pj.dyndns.org> References: <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de> <20110601231237.GA5267@server.vk2pj.dyndns.org> <20110608224801.GB35494@alchemy.franken.de> <20110613235144.GA12470@server.vk2pj.dyndns.org> <20110813143807.GY48988@alchemy.franken.de> <20110816214820.GA35017@server.vk2pj.dyndns.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 17, 2011 at 07:48:20AM +1000, Peter Jeremy wrote: > On 2011-Aug-13 16:38:07 +0200, Marius Strobl <marius@alchemy.franken.de> wrote: > >Could you please give the following patch with SCHED_4BSD (cpu_switch() > >still is missing support for SCHED_ULE) with something like -j128 > >buildworlds a try on your V890? > >http://people.freebsd.org/~marius/sparc64_replace_sched_lock_w_atomic.diff > > Getting better but still not perfect. It survived a couple of -j128 > buildworlds with another six -j16 buildworlds running in parallel. Thanks! > > But it still has the same issue pho's stress test - a thr1 process is > blocked in urdlck. The improvement is that there's only one stuck > process and it took 7? hrs at INCARNATIONS=150 instead of 1-2 hours. > (And it runs out of witness locks). > Well, the sole purpose of that patch is to get rid of the MD sched_lock usage in order to be able to add support for SCHED_ULE in a next step. It's not obvious why this should have an impact on the problem with userland mutex code. In fact using sched_lock provided more protection than solving this via atomic operations, which should still be sufficient for what we need to guarantee though. If at all I'd expect the patch to create problems in case I've overlooked something, not to solve any :) If it indeed has a positive impact on the the userland mutex problem then my best guess is that this is a side-effect of the memory barriers the patch adds to the context switching. That would indicate that the cause of the problem in fact are missing memory barriers in the userland mutex code, which IMO is one of the suspicious things regarding that code. Marius
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110817094541.GJ48988>