Date: Sat, 5 Jan 2008 13:41:30 -0500 From: "Yousif Hassan" <yousif@alumni.jmu.edu> To: "\"Frederic Chardon\"" <chardon.frederic@gmail.com>, <freebsd-acpi@freebsd.org> Subject: Re: solved ?] i386/79080: acpi thermal changes freezes HP nx6110 Message-ID: <09EE88FF8B644C90AAE0158ACB4AB595@kamino>
next in thread | raw e-mail | index | archive | help
Hi Frederic, Nate, list members: I recently tried 7.0-RC1 on an nx6110. The thermal freeze problems are definitely still there, and appear worse. I tried all of the workarounds below and nothing helped - I suspect this issue is not interrupt storm related any more, but rather, a mutex race condition of some sort... please see below... > Hello, > I found a workaround to avoid freeze while change _ACx state on > nx6110. In kernel, use > options SCHED_ULE > device apic > options AUTO_EOI_1 > options AUTO_EOI_2 I tried this. With the exact above options, by root boot device became unfindable, and no amount of tweaking at the loader prompt would get it to boot. When I removed AUTO_EOI_2 and tried again, the root filesystem booted but the freeze problems remained. I also tried the out-of-the-box GENERIC kernel, of course; freeze problems occur. > ULE and apic allow the freeze to last only a few second (without it, > I never waited more than 10 minutes but I supposed it can be long...). > AUTO_EOI_1 and AUTO_EOI_2 have no impact without ULE and apic. > Separately they don't have noticeable effect. In my case, the mutex problem causes the freeze to last forever, regardless of the scheduler used. (From a previous email by Frederic): > Pavel Rydvan stated in the pr that if the temperature doesn't change > there is no problem. In fact, it is not completely true: problem > arises when ACx _increase_. When it decreases if there is a freeze it > is unnoticable. I agree with this observation. I only get the freezes if the temperature INCREASES. > If I manually set hw.acpi.thermal.tz0.active then there is no more > problem (apart from the thermal function of ACPI becomes useless). This I tried and it didn't work for me. The "active" number remained at 1 regardless of the arguments I passed it - I tried -1, 0, 1, 2, 3, 4, 5, and 6. I don't know how you get this number to change but sysctl kept it at 1. (ex: #root# sysctl hw.acpi.thermal.tz0.active=-1 hw.acpi.thermal.tz0.active: 1 -> 1 ) > Pavel Rydvan said that it is due to IRQ storm, I can't dig deeper this > because I don't know how to do. It seems mutex-related to me. I placed as much of the debug info as I could into the PR. I'll also include it below. Thanks to anyone for reading this. --BEGIN PR 79080 INFO-- The problem is still found in the most recent 7.0 RC code as well. Has something to do with a Mutex lock/unlock problem when the thermal zone change occurs - it doesn't appear to be an interrupt storm any longer. It is assuredly ACPI-related, because disabling ACPI makes the freezes go away. However, this laptop does not function well without ACPI so it's not a good workaround. USB devices do not work w/o ACPI, as well as other hardware. There are several suggested workarounds I tried, none of which resoloved the issue. These included building the kernel with apic, disabling apic, manually changing the hw.acpi.thermal.tz0.active number (my nx6110 seems to want to keep it at 1 no matter what), and using the ULE scheduler rather than the 4BSD. Again, none of the above workarounds, in any combination, solved the issue. INFORMATION ----------- Turning on debugging, the following appears right before the lock, as soon as temperature rises enough to trigger a change in the zone: acpi_tz0: _AC3: temperature 68.0 >= setpoint 45.0 acpi_tz0: _AC2: temperature 68.0 >= setpoint 55.0 acpi_tz0: _AC3: temperature 67.0 >= setpoint 45.0 acpi_tz0: _AC2: temperature 67.0 >= setpoint 55.0 ...etc... and then: ACPI Exception (utmutex-0376): AE_TIME, Thread 28 could not acquire Mutex [0] [20070320] ACPI Error (exutils-0180): Could not acquire AML Interpreter mutex [20070320] ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release [20070320] ACPI Error (exutils-0250): Could not release AML Interpreter mutex [20070320] ACPI Exception (utmutex-0376): AE_TIME, Thread 28 could not acquire Mutex [0] [20070320] ACPI Error (exutils-0180): Could not acquire AML Interpreter mutex [20070320] ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.C242] (Node 0xc321c220), AE_TIME ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.TZ1_._TMP] (Node 0xc321b9c0), AE_TIME ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release [20070320] ACPI Error (exutils-0250): Could not release AML Interpreter mutex [20070320] ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.C242] (Node 0xc321c220), AE_TIME ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.TZ2_._TMP] (Node 0xc321b8c0), AE_TIME ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release [20070320] ACPI Error (exutils-0250): Could not release AML Interpreter mutex [20070320] (the errors continue to repeat ad infinitum, and each TZ reports problems) As a result, you will eventually see: acpi_tz0: error fetching current temperature -- AE_TIME acpi_tz1: error fetching current temperature -- AE_TIME (..etc...) The interesting thing is that THIS PROBLEM DOES NOT APPEAR in FreeBSD 6.2-RELEASE nor in any of the 6.3-RC variants. It's unique to FreeBSD 7, and it involves some of the new ACPI mutex code. This is definitely a regression for this particular laptop since it worked well in 6.x - so as such, maybe it would be worthwhile to investigate this bug. It seems general enough that it could affect other laptop ASLs as well. The ASL dump AND a sysctl dump can be found: http://www.far-far-away.com/~yousif/freebsd/ Please let me know if more information is needed. --Yousif
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?09EE88FF8B644C90AAE0158ACB4AB595>