Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Jan 2008 13:41:30 -0500
From:      "Yousif Hassan" <yousif@alumni.jmu.edu>
To:        "\"Frederic Chardon\"" <chardon.frederic@gmail.com>, <freebsd-acpi@freebsd.org>
Subject:   Re: solved ?] i386/79080: acpi thermal changes freezes HP nx6110
Message-ID:  <09EE88FF8B644C90AAE0158ACB4AB595@kamino>

next in thread | raw e-mail | index | archive | help
Hi Frederic, Nate, list members:

I recently tried 7.0-RC1 on an nx6110.  The thermal freeze
problems are definitely still there, and appear worse.  I
tried all of the workarounds below and nothing helped -
I suspect this issue is not interrupt storm related any more,
but rather, a mutex race condition of some sort...
please see below...

> Hello,

> I found a workaround to avoid freeze while change _ACx state on
> nx6110. In kernel, use
> options         SCHED_ULE
> device          apic
> options         AUTO_EOI_1
> options         AUTO_EOI_2

I tried this.  With the exact above options, by root boot device
became unfindable, and no amount of tweaking at the loader prompt
would get it to boot.  When I removed AUTO_EOI_2 and tried again,
the root filesystem booted but the freeze problems remained.

I also tried the out-of-the-box GENERIC kernel, of course;
freeze problems occur.

> ULE and apic allow the freeze to last only a few second (without it,
> I never waited more than 10 minutes but I supposed it can be long...).
> AUTO_EOI_1 and AUTO_EOI_2 have no impact without ULE and apic.
> Separately they don't have noticeable effect.

In my case, the mutex problem causes the freeze to last forever,
regardless of the scheduler used.

(From a previous email by Frederic):
> Pavel Rydvan stated in the pr that if the temperature doesn't change
> there is no problem. In fact, it is not completely true: problem
> arises when ACx _increase_. When it decreases if there is a freeze it
> is unnoticable.

I agree with this observation.  I only get the freezes if the temperature
INCREASES.

> If I manually set hw.acpi.thermal.tz0.active then there is no more
> problem (apart from the thermal function of ACPI becomes useless).

This I tried and it didn't work for me.  The "active" number remained
at 1 regardless of the arguments I passed it - I tried -1, 0, 1, 2, 3, 4,
5, and 6.  I don't know how you get this number to change but sysctl
kept it at 1.
(ex: #root# sysctl hw.acpi.thermal.tz0.active=-1
     hw.acpi.thermal.tz0.active: 1 -> 1 )

> Pavel Rydvan said that it is due to IRQ storm, I can't dig deeper this
> because I don't know how to do.

It seems mutex-related to me.  I placed as much of the debug info as I could
into the PR.  I'll also include it below.  Thanks to anyone for reading 
this.

--BEGIN PR 79080 INFO--

The problem is still found in the most recent 7.0 RC code as well.
Has something to do with a Mutex lock/unlock problem when the thermal
zone change occurs - it doesn't appear to be an interrupt storm any
longer.

It is assuredly ACPI-related, because disabling ACPI makes the freezes
go away.  However, this laptop does not function well without ACPI so
it's not a good workaround.  USB devices do not work w/o ACPI, as well
as other hardware.

There are several suggested workarounds I tried, none of which resoloved
the issue.  These included building the kernel with apic, disabling apic,
manually changing the hw.acpi.thermal.tz0.active number (my nx6110
seems to want to keep it at 1 no matter what), and using the ULE
scheduler rather than the 4BSD.  Again, none of the above workarounds,
in any combination, solved the issue.

INFORMATION
-----------
Turning on debugging, the following appears right before the lock,
as soon as temperature rises enough to trigger a change in the zone:

acpi_tz0: _AC3: temperature 68.0 >= setpoint 45.0
acpi_tz0: _AC2: temperature 68.0 >= setpoint 55.0
acpi_tz0: _AC3: temperature 67.0 >= setpoint 45.0
acpi_tz0: _AC2: temperature 67.0 >= setpoint 55.0
...etc...
and then:
ACPI Exception (utmutex-0376): AE_TIME, Thread 28 could not acquire Mutex 
[0] [20070320]
ACPI Error (exutils-0180): Could not acquire AML Interpreter mutex 
[20070320]
ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release 
[20070320]
ACPI Error (exutils-0250): Could not release AML Interpreter mutex 
[20070320]
ACPI Exception (utmutex-0376): AE_TIME, Thread 28 could not acquire Mutex 
[0] [20070320]
ACPI Error (exutils-0180): Could not acquire AML Interpreter mutex 
[20070320]
ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.C242] (Node 
0xc321c220), AE_TIME
ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.TZ1_._TMP] 
(Node 0xc321b9c0), AE_TIME
ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release 
[20070320]
ACPI Error (exutils-0250): Could not release AML Interpreter mutex 
[20070320]
ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.C242] (Node 
0xc321c220), AE_TIME
ACPI Error (psparse-0626): Method parse/execution failed [\_TZ_.TZ2_._TMP] 
(Node 0xc321b8c0), AE_TIME
ACPI Error (utmutex-0421): Mutex [0] is not acquired, cannot release 
[20070320]
ACPI Error (exutils-0250): Could not release AML Interpreter mutex 
[20070320]

(the errors continue to repeat ad infinitum, and each TZ reports problems)

As a result, you will eventually see:

acpi_tz0: error fetching current temperature -- AE_TIME
acpi_tz1: error fetching current temperature -- AE_TIME
(..etc...)

The interesting thing is that THIS PROBLEM DOES NOT APPEAR in FreeBSD
6.2-RELEASE nor in any of the 6.3-RC variants.  It's unique to FreeBSD
7, and it involves some of the new ACPI mutex code.

This is definitely a regression for this particular laptop since it worked
well
in 6.x - so as such, maybe it would be worthwhile to investigate this bug.
It seems general enough that it could affect other laptop ASLs as well.

The ASL dump AND a sysctl dump can be found:
http://www.far-far-away.com/~yousif/freebsd/

Please let me know if more information is needed.

--Yousif




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?09EE88FF8B644C90AAE0158ACB4AB595>