From owner-freebsd-acpi@FreeBSD.ORG Fri Nov 2 23:38:44 2007 Return-Path: Delivered-To: freebsd-acpi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1FFFE16A41A for ; Fri, 2 Nov 2007 23:38:44 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail6.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id BFD8713C4AC for ; Fri, 2 Nov 2007 23:38:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8p) with ESMTP id 217468327-1834499 for multiple; Fri, 02 Nov 2007 14:39:43 -0400 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id lA2JdR4I022361; Fri, 2 Nov 2007 15:39:28 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-acpi@freebsd.org Date: Fri, 2 Nov 2007 12:10:10 -0400 User-Agent: KMail/1.9.6 References: <472A53B2.6030901@nokia.com> <472AA11F.3080302@root.org> In-Reply-To: <472AA11F.3080302@root.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711021210.11259.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Fri, 02 Nov 2007 15:39:28 -0400 (EDT) X-Virus-Scanned: ClamAV 0.91.2/4662/Fri Nov 2 13:28:34 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.1 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00, DATE_IN_PAST_03_06 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Glen Subject: Re: SMP system shutdown hang (acpi_cpu_shutdown - smp_rendezvous) X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Nov 2007 23:38:44 -0000 On Friday 02 November 2007 12:01:35 am Nate Lawson wrote: > Glen wrote: > > Hi, > > > > I have been seeing intermittent hangs in the acpi shutdown code on a > > Intel 2.4GHz 8 CPU system. I am running a with a Freebsd6.1 code base > > but cannot see a reason why this can't happen in other Freebsd versions. > > The hang is very irregular, I am recreating it using an expect script > > that repeatedly reboots the system. Sometimes, I can do up to 200 > > reboots before observing the hang, sometimes, it happens after 5-20 > > reboots. > > > > It has been difficult to pin down the hang as the system is not > > responding to NMI events but using breakpoints I believe the hang is in > > acpi_cpu.c:acpi_cpu_shutdown with the call to smp_rendezvous. > > > > My theory is that one of the CPUs does not respond to ipi_all_but_self > > and that all the other CPUs are waiting for it in smp_rendezvous_action. > > The smp_rv_waiters[0] < mp_ncpus condition never gets met and the system > > hangs. This maybe happen due to other activity (or a deadlock?) on that > > CPU. > > > > I noticed a few threads relating to this and have already tried stuff > > like changing kern.sched.ipiwakeup.enabled & machdep.cpu_idle_hlt. > > Neither had any effect. > > > > 1) I tried removing the call to smp_rendezvous in acpi_cpu_shutdown and > > this stops the hang from happening. Does anyone know the purpose of this > > call in the shutdown code or if I might suffer some consequence by > > removing it? > > I have one more thing I needed to consider. There's a race where a > thread could be entering acpi_cpu_idle() to read a C2-3 register but > that register state gets destroyed with the softc before the read. In > that case, I thought there could be a panic, hence why I originally put > in the smp_rendezvous(). However, I don't think device_shutdown() frees > softcs (need to look in the newbus code to be sure). So I still should > be able to remove this code after checking more closely. It does not. Only detach does. -- John Baldwin