From owner-freebsd-stable@FreeBSD.ORG Wed Jul 8 00:57:30 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6A0511065694; Wed, 8 Jul 2009 00:57:30 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: from mail-yx0-f181.google.com (mail-yx0-f181.google.com [209.85.210.181]) by mx1.freebsd.org (Postfix) with ESMTP id 0CD3D8FC0A; Wed, 8 Jul 2009 00:57:29 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: by yxe11 with SMTP id 11so7559780yxe.3 for ; Tue, 07 Jul 2009 17:57:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=i9xQSOhaUzNrVJNjOy/zqHwLMBG0YUBBqdHeTQO1lLs=; b=S0s/f/d/7gyc4lvdHScQSVik0PnG2IIg5wV53zqOJZmBYgQ5KG2esU4BRsTpabqjJg 7pePAkx01vXq3yFxWiW8qGrGW+AckL+zFLPHbsSienFAVE+NrRi+17oenSFhLY2oxYAp OM0P0mKFHnoDr2rHRIwUUofkAbBp/luSplKL4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=n14H1XdTN+vT+5ljlVuhq2nvW7Dvbi55YC68ZDoLlWhG6Omz7sx4zhc0mh5vZ6rBqM 2gnDQ7VVTJy2+mdyBS/tvi/e4U4Fc0A+E0MJtm9GblR3l3P1XzGuqQ6AtpmIG/LAaKr5 Uem7nc3luORHRhzhvMmyMPVir8AV3entPSYMY= MIME-Version: 1.0 Received: by 10.100.96.9 with SMTP id t9mr11587681anb.106.1247014649403; Tue, 07 Jul 2009 17:57:29 -0700 (PDT) In-Reply-To: <3bbf2fe10907061827g35eaeb49g26cf6fdb64436ca7@mail.gmail.com> References: <3bbf2fe10907061818v245abd0cgc3ca5073cb93aea4@mail.gmail.com> <3bbf2fe10907061827g35eaeb49g26cf6fdb64436ca7@mail.gmail.com> Date: Wed, 8 Jul 2009 03:57:29 +0300 Message-ID: From: Dan Naumov To: Attilio Rao Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD-STABLE Mailing List Subject: Re: 7.2-release/amd64: panic, spin lock held too long X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2009 00:57:30 -0000 On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao wrote: > 2009/7/7 Dan Naumov : >> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote: >>> 2009/7/7 Dan Naumov : >>>> I just got a panic following by a reboot a few seconds after running >>>> "portsnap update", /var/log/messages shows the following: >>>> >>>> Jul =A07 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kerne= l >>>> Jul =A07 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched loc= k >>>> 1) held by 0xffffff00017d8370 (tid 100054) too long >>>> Jul =A07 03:49:38 atom kernel: panic: spin lock held too long >>> >>> That's a known bug, affecting -CURRENT as well. >>> The cpustop IPI is handled though an NMI, which means it could >>> interrupt a CPU in any moment, even while holding a spinlock, >>> violating one well known FreeBSD rule. >>> That means that the cpu can stop itself while the thread was holding >>> the sched lock spinlock and not releasing it (there is no way, modulo >>> highly hackish, to fix that). >>> In the while hardclock() wants to schedule something else to run and >>> got stuck on the thread lock. >>> >>> Ideal fix would involve not using a NMI for serving the cpustop while >>> having a cheap way (not making the common path too hard) to tell >>> hardclock() to avoid scheduling while cpustop is in flight. >>> >>> Thanks, >>> Attilio >> >> Any idea if a fix is being worked on and how unlucky must one be to >> run into this issue, should I expect it to happen again? Is it >> basically completely random? > > I'd like to work on that issue before BETA3 (and backport to > STABLE_7), I'm just time-constrained right now. > it is completely random. > > Thanks, > Attilio Ok, this is getting pretty bad, 23 hours later, I get the same kind of panic, the only difference is that instead of "portsnap update", this was triggered by "portsnap cron" which I have running between 3 and 4 am every day: Jul 8 03:03:49 atom kernel: ssppiinn lloocckk 00xxffffffffffffffff8800bb33eeddc400 ((sscchheedd lloocck k1 )0 )h ehledl db yb y 0x0xfffffffffff0f00001081735339760e 0( t(itdi d 10100006070)5 )t otoo ol olnogng Jul 8 03:03:49 atom kernel: p Jul 8 03:03:49 atom kernel: anic: spin lock held too long Jul 8 03:03:49 atom kernel: cpuid =3D 0 Jul 8 03:03:49 atom kernel: Uptime: 23h2m38s - Sincerely, Dan Naumov