From owner-freebsd-acpi@FreeBSD.ORG Sat Jan 5 19:07:33 2008 Return-Path: Delivered-To: freebsd-acpi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC05C16A418 for ; Sat, 5 Jan 2008 19:07:32 +0000 (UTC) (envelope-from yousif@alumni.jmu.edu) Received: from coruscant.far-far-away.us (coruscant.far-far-away.us [70.91.196.65]) by mx1.freebsd.org (Postfix) with SMTP id 78A5813C468 for ; Sat, 5 Jan 2008 19:07:32 +0000 (UTC) (envelope-from yousif@alumni.jmu.edu) Received: (qmail 12093 invoked from network); 5 Jan 2008 13:34:48 -0500 Received: from kamino.far-far-away.us (HELO kamino) (192.168.0.9) by coruscant.far-far-away.us with SMTP; 5 Jan 2008 13:34:48 -0500 Message-ID: <09EE88FF8B644C90AAE0158ACB4AB595@kamino> From: "Yousif Hassan" To: "\"Frederic Chardon\"" , Date: Sat, 5 Jan 2008 13:41:30 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6000.16480 X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6000.16545 Cc: Subject: Re: solved ?] i386/79080: acpi thermal changes freezes HP nx6110 X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Yousif Hassan List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Jan 2008 19:07:33 -0000 Hi Frederic, Nate, list members: I recently tried 7.0-RC1 on an nx6110. The thermal freeze problems are definitely still there, and appear worse. I tried all of the workarounds below and nothing helped - I suspect this issue is not interrupt storm related any more, but rather, a mutex race condition of some sort... please see below... > Hello, > I found a workaround to avoid freeze while change _ACx state on > nx6110. In kernel, use > options SCHED_ULE > device apic > options AUTO_EOI_1 > options AUTO_EOI_2 I tried this. With the exact above options, by root boot device became unfindable, and no amount of tweaking at the loader prompt would get it to boot. When I removed AUTO_EOI_2 and tried again, the root filesystem booted but the freeze problems remained. I also tried the out-of-the-box GENERIC kernel, of course; freeze problems occur. > ULE and apic allow the freeze to last only a few second (without it, > I never waited more than 10 minutes but I supposed it can be long...). > AUTO_EOI_1 and AUTO_EOI_2 have no impact without ULE and apic. > Separately they don't have noticeable effect. In my case, the mutex problem causes the freeze to last forever, regardless of the scheduler used. (From a previous email by Frederic): > Pavel Rydvan stated in the pr that if the temperature doesn't change > there is no problem. In fact, it is not completely true: problem > arises when ACx _increase_. When it decreases if there is a freeze it > is unnoticable. I agree with this observation. I only get the freezes if the temperature INCREASES. > If I manually set hw.acpi.thermal.tz0.active then there is no more > problem (apart from the thermal function of ACPI becomes useless). This I tried and it didn't work for me. The "active" number remained at 1 regardless of the arguments I passed it - I tried -1, 0, 1, 2, 3, 4, 5, and 6. I don't know how you get this number to change but sysctl kept it at 1. (ex: #root# sysctl hw.acpi.thermal.tz0.active=-1 hw.acpi.thermal.tz0.active: 1 -> 1 ) > Pavel Rydvan said that it is due to IRQ storm, I can't dig deeper this > because I don't know how to do. It seems mutex-related to me. I placed as much of the debug info as I could into the PR. I'll also include it below. Thanks to anyone for reading this. --BEGIN PR 79080 INFO-- The problem is still found in the most recent 7.0 RC code as well. Has something to do with a Mutex lock/unlock problem when the thermal zone change occurs - it doesn't appear to be an interrupt storm any longer. It is assuredly ACPI-related, because disabling ACPI makes the freezes go away. However, this laptop does not function well without ACPI so it's not a good workaround. USB devices do not work w/o ACPI, as well as other hardware. There are several suggested workarounds I tried, none of which resoloved the issue. These included building the kernel with apic, disabling apic, manually changing the hw.acpi.thermal.tz0.active number (my nx6110 seems to want to keep it at 1 no matter what), and using the ULE scheduler rather than the 4BSD. Again, none of the above workarounds, in any combination, solved the issue. INFORMATION ----------- Turning on debugging, the following appears right before the lock, as soon as temperature rises enough to trigger a change in the zone: acpi_tz0: _AC3: temperature 68.0 >= setpoint 45.0 acpi_tz0: _AC2: temperature 68.0 >= setpoint 55.0 acpi_tz0: _AC3: temperature 67.0 >= setpoint 45.0 acpi_tz0: _AC2: temperature 67.0 >= setpoint 55.0 ...etc... and then: ACPI Exception (utmutex-0376): AE_TIME, Thread 28 could not acquire Mutex [0] [20070320] ACPI Error (exutils-0180): Could not acquire AML Interpreter mutex [20070320] ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release [20070320] ACPI Error (exutils-0250): Could not release AML Interpreter mutex [20070320] ACPI Exception (utmutex-0376): AE_TIME, Thread 28 could not acquire Mutex [0] [20070320] ACPI Error (exutils-0180): Could not acquire AML Interpreter mutex [20070320] ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.C242] (Node 0xc321c220), AE_TIME ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.TZ1_._TMP] (Node 0xc321b9c0), AE_TIME ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release [20070320] ACPI Error (exutils-0250): Could not release AML Interpreter mutex [20070320] ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.C242] (Node 0xc321c220), AE_TIME ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.TZ2_._TMP] (Node 0xc321b8c0), AE_TIME ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release [20070320] ACPI Error (exutils-0250): Could not release AML Interpreter mutex [20070320] (the errors continue to repeat ad infinitum, and each TZ reports problems) As a result, you will eventually see: acpi_tz0: error fetching current temperature -- AE_TIME acpi_tz1: error fetching current temperature -- AE_TIME (..etc...) The interesting thing is that THIS PROBLEM DOES NOT APPEAR in FreeBSD 6.2-RELEASE nor in any of the 6.3-RC variants. It's unique to FreeBSD 7, and it involves some of the new ACPI mutex code. This is definitely a regression for this particular laptop since it worked well in 6.x - so as such, maybe it would be worthwhile to investigate this bug. It seems general enough that it could affect other laptop ASLs as well. The ASL dump AND a sysctl dump can be found: http://www.far-far-away.com/~yousif/freebsd/ Please let me know if more information is needed. --Yousif