Date: Thu, 16 Jun 2011 01:12:26 +0200 From: Marius Strobl <marius@alchemy.franken.de> To: Peter Jeremy <peterjeremy@acm.org> Cc: freebsd-sparc64@freebsd.org Subject: Re: 'make -j16 universe' gives SIReset Message-ID: <20110615231226.GY7064@alchemy.franken.de> In-Reply-To: <20110614214959.GB91014@server.vk2pj.dyndns.org> References: <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de> <20110601231237.GA5267@server.vk2pj.dyndns.org> <20110608224801.GB35494@alchemy.franken.de> <20110613235144.GA12470@server.vk2pj.dyndns.org> <20110614214959.GB91014@server.vk2pj.dyndns.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 15, 2011 at 07:49:59AM +1000, Peter Jeremy wrote: > On 2011-Jun-14 09:51:44 +1000, Peter Jeremy <peter@server.vk2pj.dyndns.org> wrote: > >I'm building r223035 with DDB & KDB and will see how that goes. > > I had another try with WITNESS & INVARIANTS and got a different panic: > panic: blockable sleep lock (sleep mutex) system map @ /usr/src/sys/vm/vm_map.c:3651 > cpuid = 13 > KDB: stack backtrace: > panic() at panic+0x1c8 > witness_checkorder() at witness_checkorder+0x108 > _mtx_lock_flags() at _mtx_lock_flags+0x110 > _vm_map_lock_read() at _vm_map_lock_read+0x1c > vm_map_lookup() at vm_map_lookup+0x4c > vm_fault_hold() at vm_fault_hold+0x94 > vm_fault() at vm_fault+0x14 > trap_pfault() at trap_pfault+0x338 > trap() at trap+0x3a8 > -- fast data access mmu miss tar=0x2000 %o7=0xc055e038 -- > intr_vector_stray() at intr_vector_stray+0x8 > sched_switch() at sched_switch+0x290 > mi_switch() at mi_switch+0x2a8 > sleepq_switch() at sleepq_switch+0x1cc > sleepq_catch_signals() at sleepq_catch_signals+0x130 > sleepq_timedwait_sig() at sleepq_timedwait_sig+0x8 > _cv_timedwait_sig() at _cv_timedwait_sig+0x344 > seltdwait() at seltdwait+0x74 > kern_select() at kern_select+0x618 > select() at select+0x44 > syscallenter() at syscallenter+0x270 > syscall() at syscall+0x74 > -- syscall (93, FreeBSD ELF64, select) %o7=0x1099dc -- > userland() at 0x14bde8 > user trace: trap %o7=0x1099dc > pc 0x14bde8, sp 0x7fdffffc8d1 > pc 0x26c800, sp 0x26c800 > done > KDB: enter: panic > > Unfortunately, still no DDB - just a hang This backtrace shows two things that just shouldn't happen hardware-wise: a) The CPU issues an stray interrupt vector. This would explain the SIRs you were seeing without the patch which tries to make these non-fatal. b) The CPU faults on an address which is covered by an locked TLB slot. The funny thing is that the CPU then actually still manages to panic; if something like b) occurs I'd expect it to be in a totally unusable state. I'm not sure what to do about these as it still looks like broken hardware or a silicon bug to me but at least the public errata doesn't mention something like that and the OpenSolaris source doesn't seem to work around something like these in an obvious way either. The only thing I can think of is to try whether just ignoring the stray interrupt vectors with the below patch avoids any further issues. You'll need to revert sparc64_intr_vector_stray.diff for that or at least the exception.S part. Marius Index: exception.S =================================================================== --- exception.S (revision 223042) +++ exception.S (working copy) @@ -578,7 +578,7 @@ andcc %g1, IRSR_BUSY, %g0 bnz,a,pt %xcc, intr_vector nop - sir + retry .align 32 .endm
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110615231226.GY7064>