FreeBSD Mail Archives

Date:      Mon, 29 Sep 2014 00:00:28 -0400
From:      Chris Ross <cross+freebsd@distal.com>
To:        freebsd-sparc64@freebsd.org
Subject:   Re: FreeBSD 10-STABLE/sparc64 panic
Message-ID:  <456226AE-0712-4510-AEF5-2053F36F2181@distal.com>
In-Reply-To: <AF5EA0E6-860B-47DF-AC5E-6A45317C6092@distal.com>
References:  <20140518083413.GK24043@gradx.cs.jhu.edu> <751F7778-95CE-40FC-857F-222FB37737C0@distal.com> <20140518235853.GM24043@gradx.cs.jhu.edu> <20140519145222.GN24043@gradx.cs.jhu.edu> <A092DFEB-D5CF-473E-88BD-81B005C26C57@distal.com> <20140519193529.GO24043@gradx.cs.jhu.edu> <20140519205047.GP24043@gradx.cs.jhu.edu> <CA75738D-066D-4EDC-9018-89936EE861C6@distal.com> <AB5649B5-BBFB-4284-9CFF-4784D28A18F3@distal.com> <A9D37635-CA61-401B-BEAE-14C4F370BFD6@distal.com> <BC35853D-DA5E-4799-947C-4C64A0BC7D36@distal.com> <D9350E94-1F01-4FFD-A51E-AD8761F5C9CF@distal.com> <E48E7175-310B-4449-B3E1-2058F9E681D0@distal.com> <323A3936-DE55-459A-B8AA-CFF463922F22@distal.com> <7DD7D2DC-A265-40D6-9995-16ABAF79C1FB@distal.com> <AF5EA0E6-860B-47DF-AC5E-6A45317C6092@distal.com>

On Jun 30, 2014, at 10:40 , Chris Ross <cross+freebsd@distal.com> wrote:
> tl;dr : I�ve finished my testing and have a result, but see other things I
> don�t understand.  Could use more help.

  Old thread, problem still exists.  Noticed in head around:

http://lists.freebsd.org/pipermail/freebsd-sparc64/2014-March/009261.html

  And in stable/10 as of revision 263676 (likely earlier).  As numerous people
have tried, I have also tried, to narrow it down to a commit, or small number
of commits, but the failure is sporadic.  I think looking at the current code which
is still failing may be most useful.

  I am right now seeing this on stable/10 code updated today, 10.1-BETA3,
r272264.  As noted earlier in these threads, I am running a Sun Fire v240.  At
least one or two other folks with v240's have seen this, and I think a variant
of SunBlade that also has bge's on it.

  Multiuser boot panics at:

Setting hostname: hostname.distal.com.
bge0: link state changed to DOWN
spin lock 0xc0c95330 (smp rendezvous) held by 0xfffff8000560a490 (tid 100347) too long
timeout stopping cpus
panic: spin lock held too long
cpuid = 1
KDB: stack backtrace:
#0 0xc054a0d0 at _mtx_lock_spin_failed+0x50
#1 0xc054a198 at _mtx_lock_spin_cookie+0xb8
#2 0xc08b989c at tick_get_timecount_mp+0xdc
#3 0xc056c33c at binuptime+0x3c
#4 0xc08857ac at timercb+0x6c
#5 0xc08b9c00 at tick_intr+0x220
Uptime: 20s
Automatic reboot in 15 seconds - press a key on the console to abort

  In past kernels, ones more recent than March 2014, it will sometimes
boot [to multiuser] the first try, but usually will crash a few times, but
eventually come all the way up.  Given 30-40 minutes, it will usually
recover to multiuser, and is stable forever (in past testing) at that point.
This evening, it was rebooting for about 40 minutes (11 panic and
reboot sequences), but then came up.

  I would be happy to dig into this further, but will need some advice and
instruction.  I fear I may not even have built the kernel with full debugging,
but can do so.  I'll look into that now that the machine is up again.

  Please let me know what I can do to help.  Thanks.

                                      - Chris

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?456226AE-0712-4510-AEF5-2053F36F2181>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation