Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 May 2012 13:11:21 -0400
From:      Jung-uk Kim <jkim@FreeBSD.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-stable@freebsd.org, Yamagi Burmeister <lists@yamagi.org>, seanbru@yahoo-inc.com
Subject:   Re: [stable 9] broken hwpstate calls
Message-ID:  <4FBFBD39.7000105@FreeBSD.org>
In-Reply-To: <4FBFA9A9.7020806@FreeBSD.org>
References:  <1337319129.2915.4.camel@powernoodle-l7> <4FB6765A.2050307@FreeBSD.org> <1337710214.2916.8.camel@powernoodle-l7.corp.yahoo.com> <20120525163653.b61a08e2.lists@yamagi.org> <4FBFA9A9.7020806@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-05-25 11:47:53 -0400, Andriy Gapon wrote:

> on 25/05/2012 17:36 Yamagi Burmeister said the following:
>> Hello, a user at BSDForen.de had the same problem and I helped
>> him to track it down. While I was unable to find a solution I
>> found the cause of the problems. The problem is (all files and
>> line numbers relative to FreeBSD 9.0-RELEASE):
>> 
>> 1. When a new p-state is requested (by powerd or by changing the
>> sysctl) in kern/kern_cpu.c the function cf_set_method() is
>> invoked. 2. In line 335 the driver depended function is called.
>> For newer AMD CPU it's hwpstate_set() in x86/cpufreq/hwpstate.c 
>> 3. In x86/cpufreq/hwpstate.c line 227 hwpstate_goto_pstate() is 
>> called which does the actual magic. 4. In line 199 "if (msr !=
>> id)" triggers, returns ENXIO. The error is send back to
>> cf_set_method(), which prints the "hwpstate0: set freq failed".
>> powerd translates the error to "Device not configured"
>> 
>> After some further investigation and looking at the linux driver
>> [0] I changed the loop at x86/cpufreq/hwpstate.c 188ff from 100
>> iterations to 250. It lessend the problem but didn't solve it.
>> The next step was to rewrite the logic between line 183 and 203
>> and adding a lot of debug printf. The patch (non style(9)
>> compliant) is attached. With the new logic every 100 usec the new
>> p-state is set again, until it's accepted. After 100 tries ENXIO
>> is returned.
>> 
>> This lessend the problem even more and showed that - On an old
>> Phenom II X4 940 (K10 / Deneb) the new p-state is always accepted
>> at the first try. At a test run for about 3 hours there was not a
>> single failure. - On the Bulldozer CPU in about 9 in 10 times the
>> new p-state is accepted at the first try. At most other times the
>> new p-state is accepted after 1 to 10 tries. And there is a
>> ~0,25% chance that the new p-state is never accepted, leading to
>> "hwpstate0: set freq failed". At the next call to cf_set_method()
>> (about 500ms to 1s later) the new p-state it's most likely set
>> successfull. This can be seen at the log (full log attached):
>> 
>> # First call, failed after 102 iterations hwpstate0: MSR: 0 ID:
>> 1 hwpstate0: Setting failed! hwpstate0: Iterations: 102
>> 
>> # Second call, successfull hwpstate0: setting P1-state on cpu1 
>> hwpstate0: MSR: 1 ID: 1 hwpstate0: Iterations: 1
>> 
>> So the big question is: Why is the new p-state sometimes not
>> accepted? And why does this only happen on Bulldozer CPUs and not
>> at the old K10 (Barcelona, Deneb), etc? Reading the "BIOS and
>> kernel developer guide" for Bulldozer didn't show anything, but I
>> may have overlooked it. One solution may be to change hwpstate
>> not to set p-states but "Frequency IDs" (FID) and "Voltage ID"
>> (VID) like the linux driver does.
> 
> I think that you misread their code a little bit.  The vid/fid
> transition is used for K8 processors, for newer ones they do the
> same P-state transitioning. The secret of their success seems to be
> that they just write the MSR without any post-write checks.
> 
> As to your questions about hardware behavior - yes, they are quite
> interesting, but perhaps irrelevant.  The BKDG never specifies the
> OS P-state transition command sequence and timings, it just says
> "write the MSR" (exactly what Linux does).  There could be
> different reasons why a core could be in a different P-state
> (mostly I suspect interactions between cores in a Bulldozer compute
> unit). When BKDG does specify the P-state transitioning sequence
> (for BIOS) it suggest a different one from what we do - first write
> the MSR on al processors, only then check the result (and the way
> of checking the result is also a bit different - using MSR xxx71).

I just looked through the BKDG and I think you should definitely check
MSRC001_0071[18:16].  MSRC001_0063[2:0] is "SharedC" but
MSRC001_0062[2:0] and MSRC001_0071[18:15] are "Not-same-for-all".  I
think this means writing a P-state to MSRC001_0062[2:0] will be
reflected in MSRC001_0070[18:16] first, then MSRC001_0071[18:16] gets
updated when the P-state transition is complete.  MSRC001_0063[2:0]
will only change when all cores in a compute unit is in sync., which
may be too late.

> But as I've said, these details are probably not really useful in
> practice.  We should just quit worrying and double-checking the
> hardware :-)

I think we should check.

Jung-uk Kim

>> 0: 
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=drivers/cpufreq/powernow-k8.c;h=c0e816468e300f242735f4825d09b9d291a9b522;hb=HEAD
>>
>>
>> 
1:
>> http://support.amd.com/us/Processor_TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+/vTkACgkQmlay1b9qnVPY9QCgqcEvUHKKQ3U0Rec5Kzdlrw3L
kSkAnj6ofOf8PVkEHlxNrgGZAHJ2so1p
=GQw9
-----END PGP SIGNATURE-----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FBFBD39.7000105>