From owner-freebsd-acpi@FreeBSD.ORG Fri Nov 2 19:50:19 2007 Return-Path: Delivered-To: freebsd-acpi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8617D16A417 for ; Fri, 2 Nov 2007 19:50:19 +0000 (UTC) (envelope-from nate@root.org) Received: from root.org (root.org [67.118.192.226]) by mx1.freebsd.org (Postfix) with ESMTP id 476D213C491 for ; Fri, 2 Nov 2007 19:50:19 +0000 (UTC) (envelope-from nate@root.org) Received: (qmail 6040 invoked from network); 2 Nov 2007 19:49:43 -0000 Received: from 209-128-117-003.bayarea.net (HELO ?10.0.8.5?) (nate-mail@209.128.117.3) by root.org with ESMTPA; 2 Nov 2007 19:49:43 -0000 Message-ID: <472B8DA4.1000308@root.org> Date: Fri, 02 Nov 2007 12:50:44 -0800 From: Nate Lawson User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: Glen References: <472A53B2.6030901@nokia.com> In-Reply-To: <472A53B2.6030901@nokia.com> X-Enigmail-Version: 0.95.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: ACPI mailing list Subject: Re: SMP system shutdown hang (acpi_cpu_shutdown - smp_rendezvous) X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Nov 2007 19:50:19 -0000 Glen wrote: > Hi, > > I have been seeing intermittent hangs in the acpi shutdown code on a > Intel 2.4GHz 8 CPU system. I am running a with a Freebsd6.1 code base > but cannot see a reason why this can't happen in other Freebsd versions. > The hang is very irregular, I am recreating it using an expect script > that repeatedly reboots the system. Sometimes, I can do up to 200 > reboots before observing the hang, sometimes, it happens after 5-20 > reboots. > > It has been difficult to pin down the hang as the system is not > responding to NMI events but using breakpoints I believe the hang is in > acpi_cpu.c:acpi_cpu_shutdown with the call to smp_rendezvous. > > My theory is that one of the CPUs does not respond to ipi_all_but_self > and that all the other CPUs are waiting for it in smp_rendezvous_action. > The smp_rv_waiters[0] < mp_ncpus condition never gets met and the system > hangs. This maybe happen due to other activity (or a deadlock?) on that > CPU. Fix committed, I'll do my best to get it into 7.0 and 6.3. -- Nate