From owner-freebsd-sparc64@FreeBSD.ORG Mon Sep 29 04:01:59 2014 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0B7016FE for ; Mon, 29 Sep 2014 04:01:59 +0000 (UTC) Received: from hydra.pix.net (hydra.pix.net [IPv6:2001:470:e254::4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C876C2E0 for ; Mon, 29 Sep 2014 04:01:58 +0000 (UTC) Received: from mail.distal.com (mail.distal.com [IPv6:2001:470:e24c:200::ae25]) (authenticated bits=0) by hydra.pix.net (8.14.9/8.14.9) with ESMTP id s8T41nnJ076766 for ; Mon, 29 Sep 2014 00:01:57 -0400 (EDT) (envelope-from cross+freebsd@distal.com) Received: from magrathea.distal.com (magrathea.distal.com [IPv6:2001:470:e24c:200:ea06:88ff:feca:960e]) (authenticated bits=0) by mail.distal.com (8.14.8/8.14.8) with ESMTP id s8T40SMi001076 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for ; Mon, 29 Sep 2014 00:00:28 -0400 (EDT) (envelope-from cross+freebsd@distal.com) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: FreeBSD 10-STABLE/sparc64 panic From: Chris Ross In-Reply-To: Date: Mon, 29 Sep 2014 00:00:28 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <456226AE-0712-4510-AEF5-2053F36F2181@distal.com> References: <20140518083413.GK24043@gradx.cs.jhu.edu> <751F7778-95CE-40FC-857F-222FB37737C0@distal.com> <20140518235853.GM24043@gradx.cs.jhu.edu> <20140519145222.GN24043@gradx.cs.jhu.edu> <20140519193529.GO24043@gradx.cs.jhu.edu> <20140519205047.GP24043@gradx.cs.jhu.edu> <323A3936-DE55-459A-B8AA-CFF463922F22@distal.com> <7DD7D2DC-A265-40D6-9995-16ABAF79C1FB@distal.com> To: freebsd-sparc64@freebsd.org X-Mailer: Apple Mail (2.1878.6) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.distal.com [IPv6:2001:470:e24c:200::ae25]); Mon, 29 Sep 2014 00:00:28 -0400 (EDT) X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Sep 2014 04:01:59 -0000 On Jun 30, 2014, at 10:40 , Chris Ross wrote: > tl;dr : I=92ve finished my testing and have a result, but see other = things I > don=92t understand. Could use more help. Old thread, problem still exists. Noticed in head around: = http://lists.freebsd.org/pipermail/freebsd-sparc64/2014-March/009261.html And in stable/10 as of revision 263676 (likely earlier). As numerous = people have tried, I have also tried, to narrow it down to a commit, or small = number of commits, but the failure is sporadic. I think looking at the current = code which is still failing may be most useful. I am right now seeing this on stable/10 code updated today, = 10.1-BETA3, r272264. As noted earlier in these threads, I am running a Sun Fire = v240. At least one or two other folks with v240's have seen this, and I think a = variant of SunBlade that also has bge's on it. Multiuser boot panics at: Setting hostname: hostname.distal.com. bge0: link state changed to DOWN spin lock 0xc0c95330 (smp rendezvous) held by 0xfffff8000560a490 (tid = 100347) too long timeout stopping cpus panic: spin lock held too long cpuid =3D 1 KDB: stack backtrace: #0 0xc054a0d0 at _mtx_lock_spin_failed+0x50 #1 0xc054a198 at _mtx_lock_spin_cookie+0xb8 #2 0xc08b989c at tick_get_timecount_mp+0xdc #3 0xc056c33c at binuptime+0x3c #4 0xc08857ac at timercb+0x6c #5 0xc08b9c00 at tick_intr+0x220 Uptime: 20s Automatic reboot in 15 seconds - press a key on the console to abort In past kernels, ones more recent than March 2014, it will sometimes boot [to multiuser] the first try, but usually will crash a few times, = but eventually come all the way up. Given 30-40 minutes, it will usually recover to multiuser, and is stable forever (in past testing) at that = point. This evening, it was rebooting for about 40 minutes (11 panic and reboot sequences), but then came up. I would be happy to dig into this further, but will need some advice = and instruction. I fear I may not even have built the kernel with full = debugging, but can do so. I'll look into that now that the machine is up again. Please let me know what I can do to help. Thanks. - Chris