Date: Thu, 2 Jun 2011 09:12:37 +1000 From: Peter Jeremy <peterjeremy@acm.org> To: Marius Strobl <marius@alchemy.franken.de> Cc: freebsd-sparc64@freebsd.org Subject: Re: 'make -j16 universe' gives SIReset Message-ID: <20110601231237.GA5267@server.vk2pj.dyndns.org> In-Reply-To: <20110527120659.GA78000@alchemy.franken.de> References: <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2011-May-27 14:06:59 +0200, Marius Strobl <marius@alchemy.franken.de> wr= ote: >On Fri, May 27, 2011 at 09:47:28AM +1000, Peter Jeremy wrote: >> I tried a "make -j16 universe" using a recent 8-stable on a 16-CPU >> V890 and after about 11 minutes, I got the following. This box >It probably doesn't hurt to check the hardware with SunVTS though. I ran a comprehensive SunVTS test for just under 24 hours without any errors and then checked that an unmodified kernel still SIR'd (which it did). >ignores a certain amount of these). You could try whether the following >patch makes any difference to the SIR you're seeing: >http://people.freebsd.org/~marius/sparc64_intr_vector_stray.diff Running -current, I didn't see any stray interrupts but I did get some "spin lock held too long" panics. Unfortunately, I don't seem to get crashdumps - with DDB/KDB in the kernel, I get a hard hang - RSC break doesn't work and the only way out is a RSC reset. spin lock 0xc07914d8 (sched lock) held by 0xfffff8a29749d980 (tid 100443) t= oo long panic: spin lock held too long cpuid =3D 14 KDB: stack backtrace: panic() at panic+0x1c8 _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x50 _mtx_lock_spin() at _mtx_lock_spin+0xac sched_add() at sched_add+0x9c sched_wakeup() at sched_wakeup+0x44 setrunnable() at setrunnable+0xb8 sleepq_resume_thread() at sleepq_resume_thread+0xfc sleepq_broadcast() at sleepq_broadcast+0xbc wakeup() at wakeup+0x18 exit1() at exit1+0x10b0 sys_exit() at sys_exit+0x10 syscallenter() at syscallenter+0x2f4 syscall() at syscall+0x30 -- syscall (1, FreeBSD ELF64, sys_exit) %o7=3D0x128ad0 -- userland() at 0x14b748 user trace: trap %o7=3D0x128ad0 pc 0x14b748, sp 0x7fdffffdb81 spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t= oo long spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t= oo long spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t= oo long panic: spin lock held too long spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t= oo long spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t= oo long cpuid =3D 3 spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t= oo long spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t= oo long KDB: stack backtrace: spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t= oo long spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t= oo long pspin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) = too long aspin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) = too long nspin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) = too long ispin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) = too long cspin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) = too long (spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) = too long ) at panic+0x1c8 _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x50 _thread_lock_flags() at _thread_lock_flags+0xf8 statclock() at statclock+0x144 handleevents() at handleevents+0x120 timercb() at timercb+0x4c0 tick_intr() at tick_intr+0x1fc -- interrupt level=3D0xe pil=3D0 %o7=3D0x5faa28 -- userland() at 0x5faa40 user trace: trap %o7=3D0x5faa28 pc 0x5faa40, sp 0x7fdffffb5f1 pc 0x5feab0, sp 0x7fdffffbf01 pc 0x600310, sp 0x7fdffffc011 pc 0x600628, sp 0x7fdffffc141 pc 0x6035f4, sp 0x7fdffffc251 pc 0x603b1c, sp 0x7fdffffc361 pc 0x605b14, sp 0x7fdffffc471 pc 0x5e6aa8, sp 0x7fdffffc551 pc 0x19c594, sp 0x7fdffffc611 pc 0x5a4aa0, sp 0x7fdffffc6d1 pc 0x454e08, sp 0x7fdffffc791 pc 0x4a65c0, sp 0x7fdffffc8c1 pc 0x4a0558, sp 0x7fdffffc981 pc 0x4a06c8, sp 0x7fdffffca41 pc 0x1924e0, sp 0x7fdffffcb01 pc 0x192684, sp 0x7fdffffcbc1 pc 0x1926a4, sp 0x7fdffffcc81 pc 0x1926a4, sp 0x7fdffffcd41 pc 0x54c710, sp 0x7fdffffce01 pc 0x144080, sp 0x7fdffffcec1 pc 0x17d698, sp 0x7fdffffcf81 pc 0x17e280, sp 0x7fdffffd041 pc 0x148710, sp 0x7fdffffd101 pc 0x1b9264, sp 0x7fdffffd1d1 pc 0x1001d0, sp 0x7fdffffd291 pc 0, sp 0x7fdffffd351 done KDB: enter: panic On -stable, I didn't see any stray interrupts but had the system hang several times, needing a RSC reset to recover. >and once with: > val &=3D ~DCR_SI; I tried adding this and the system survived a "make -j30 universe" on -stable (BTW "make universe" seems to have issues cross-building x86 derivatives). I'm now trying that on -current. I'm not sure what the implications of the above change are. --=20 Peter Jeremy --UugvWAfsgieZRqgk Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) iEYEARECAAYFAk3mx2UACgkQ/opHv/APuIcazgCfeS5I0HM5ODCLabzxtfkz42zC uvQAn3rjYotdp/KYnUNY3cYdk1gEyc8g =1v5M -----END PGP SIGNATURE----- --UugvWAfsgieZRqgk--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110601231237.GA5267>