Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 16 Mar 2013 12:56:03 -0400
From:      Jim Ohlstein <jim@ohlste.in>
To:        freebsd-stable@freebsd.org
Cc:        Jeremy Chadwick <jdc@koitsu.org>, Rui Paulo <rpaulo@FreeBSD.org>, Norikatsu SHIGEMURA <nork@FreeBSD.org>, Jung-uk KIM <jkim@FreeBSD.org>, Zoran Kolic <zkolic@sbb.rs>
Subject:   Re: amdtemp does not find my CPU.
Message-ID:  <5144A423.2060007@ohlste.in>
In-Reply-To: <20130316062013.GA35674@icarus.home.lan>
References:  <20130315161508.GA963@mycenae.sbb.rs> <51437383.2080705@ohlste.in> <20130316062013.GA35674@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 3/16/13 2:20 AM, Jeremy Chadwick wrote:
> On Fri, Mar 15, 2013 at 03:16:19PM -0400, Jim Ohlstein wrote:
>> On 3/15/13 12:15 PM, Zoran Kolic wrote:
>>> After I installed 9.1 amd64 on node with amd 8120,
>>> I was not able to read temperatures out of the box.
>>> I fetched source for head module and compiled. And
>>> loaded module. Still nothing. I assume my cpu is
>>> a bit different.
>>> Best regards
>>
>> The module from head "works" for me with an 8120 on 9.1 stable (r247893)
>> though the results are inconsistent. I am not certain of how useful they
>> are.
>>
>> # sysctl hw.model
>> hw.model: AMD FX(tm)-8120 Eight-Core Processor
>>
>> # kldstat | grep amd
>>  5    1 0xffffffff8183e000 1043     amdtemp.ko
>>
>> # sysctl -a | grep dev.amdtemp
>> dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors
>> dev.amdtemp.0.%driver: amdtemp
>> dev.amdtemp.0.%parent: hostb4
>> dev.amdtemp.0.sensor_offset: 0
>> dev.amdtemp.0.core0.sensor0: 47.7C
>>
>> Here are results taken at 0.1 second intervals using a shell script:
>>
>> dev.amdtemp.0.core0.sensor0: 42.1C
>> dev.amdtemp.0.core0.sensor0: 42.2C
>> dev.amdtemp.0.core0.sensor0: 42.0C
>> dev.amdtemp.0.core0.sensor0: 42.1C
>> dev.amdtemp.0.core0.sensor0: 41.8C
>> dev.amdtemp.0.core0.sensor0: 41.7C
>> dev.amdtemp.0.core0.sensor0: 51.1C
>> dev.amdtemp.0.core0.sensor0: 51.0C
>> dev.amdtemp.0.core0.sensor0: 50.7C
>> dev.amdtemp.0.core0.sensor0: 50.5C
>> dev.amdtemp.0.core0.sensor0: 50.1C
>> dev.amdtemp.0.core0.sensor0: 49.8C
>> dev.amdtemp.0.core0.sensor0: 49.5C
>> dev.amdtemp.0.core0.sensor0: 49.2C
>> dev.amdtemp.0.core0.sensor0: 49.2C
>>
>>
>> and again:
>>
>> dev.amdtemp.0.core0.sensor0: 41.5C
>> dev.amdtemp.0.core0.sensor0: 41.2C
>> dev.amdtemp.0.core0.sensor0: 40.8C
>> dev.amdtemp.0.core0.sensor0: 40.8C
>> dev.amdtemp.0.core0.sensor0: 41.0C
>> dev.amdtemp.0.core0.sensor0: 41.3C
>> dev.amdtemp.0.core0.sensor0: 41.6C
>> dev.amdtemp.0.core0.sensor0: 41.3C
>> dev.amdtemp.0.core0.sensor0: 54.0C
>> dev.amdtemp.0.core0.sensor0: 53.7C
>> dev.amdtemp.0.core0.sensor0: 53.3C
>> dev.amdtemp.0.core0.sensor0: 53.1C
>> dev.amdtemp.0.core0.sensor0: 52.7C
>> dev.amdtemp.0.core0.sensor0: 52.3C
>> dev.amdtemp.0.core0.sensor0: 52.1C
>> dev.amdtemp.0.core0.sensor0: 51.7C
>> dev.amdtemp.0.core0.sensor0: 51.5C
>>
>> You can see during each series there are sudden increases of over 9C and
>> almost 13C respectively.
>>
>> The same effect is seen if I track any of the individual cores with
>> "dev.cpu.[0-7].temperature". Here's an example with a 9C jump in 0.1 second.
>>
>> dev.cpu.3.temperature: 41.5C
>> dev.cpu.3.temperature: 41.5C
>> dev.cpu.3.temperature: 41.7C
>> dev.cpu.3.temperature: 41.7C
>> dev.cpu.3.temperature: 41.3C
>> dev.cpu.3.temperature: 41.0C
>> dev.cpu.3.temperature: 40.7C
>> dev.cpu.3.temperature: 49.8C
>> dev.cpu.3.temperature: 49.5C
>> dev.cpu.3.temperature: 49.2C
>> dev.cpu.3.temperature: 48.8C
>> dev.cpu.3.temperature: 48.6C
>> dev.cpu.3.temperature: 48.2C
>> dev.cpu.3.temperature: 48.0C
>>
>> I don't have hands on access to this box as it's in a datacenter 1000
>> miles from me, but the techs there had a look and all "seems to be OK".
> 
> 1. While it's certainly possible the DTS reading routines and/or the
> calculation formulas may be wrong in amdtemp(4), possibly for your model
> of CPU, it is also certainly possible that what you're seeing is normal
> and fully justified.  This is especially the case for the
> dev.cpu.X.temperature nodes on the K8 family.
> 
> Respectfully, not combatively nor dismissively: you've not provided a
> comparison base to prove there's an issue.  You would need to provide
> data from Linux (I forget what daemon/tool they have to get this) or
> Windows (Core Temp).

Respectfully, not combatively nor dismissively: I hadn't attempted to
"prove" anything. I said: "I am not certain of how useful they [the
readings] are.". I had merely provided some observational data as an
aside to the fact that yes, indeed, the module provides readings for me
on the 8120 This was in direct response to to Zoran's issue with this
module and that processor model.

This started, for me, when I looked at a graph of average core
temperatures taken at 30 second intervals on two different machines
using Zabbix. The fluctuations were visibly (I know that's not
scientific "proof") more wild than on this server than on another using
the amdtemp module from 9 stable.

I don't have access to another server with this model CPU on any other
OS, or even on this OS, so I cannot provide the data to "prove" this is
an issue according to your criteria. However, I will provide comparative
data from the other machine with the module from stable and with the the
module from head.


Full data taken now:

# sysctl hw.model
hw.model: AMD FX(tm)-8120 Eight-Core Processor

Using the module from head:

http://pastebin.com/wqQ0FLq3

Note the big change between lines 34 and 35.


# sysctl hw.model
hw.model: AMD Phenom(tm) II X6 1055T Processor

Using the module from stable:

http://pastebin.com/2jzEWZxf


Using the module from head:

http://pastebin.com/RXsbvM20


These data, which I reproduced multiple times, would *seem* to suggest
that the variation seen is perhaps related to the hardware in the first
box as the other box produces consistent results with both versions of
the kernel module. This could be, among other things, the processor
itself (inherent to this processor line, or unique to my copy) or, I
suppose, some other hardware issue such as an improperly installed heat
sink. I am interested to see if anyone else has seen these type of
results with the 8120.

However, another interesting point came up, and I do believe this to be
an "issue". As can be seen, in the second box, the data are consistent
over a 10 second period in each run, but the data from the two modules
differ by about 10-12C. I did many runs back and forth with the two
modules with only a short delay, and this was a consistent finding. The
module from stable always produced substantially higher readings than
the module from head. The posted data from that box are from about half
a minute apart.


> 
> 2. I have a gut feeling I know what may be causing what you're seeing,
> but I need you to provide verbatim the shell script you're using.

I said it was simple. Here it is, only modified to add a date stamp:

#!/bin/sh

echo `date`

for i in `seq 1 $3`
 	do
		$1  ; sleep $2 ;
	done

The exact command used for each run reported here:

# ./repeat.sh "sysctl dev.cpu.3.temperature" 0.1 100

The results for "dev.amdtemp.0.core0.sensor0" were equivalent to the
data provided, although the OID was renamed from
"dev.amdtemp.0.sensor0.core0" to "dev.amdtemp.0.core0.sensor0" between
the two versions of the module.


To improve readability, I did not provide all lines of data in the
previous email, but all data from fresh runs are included in the linked
results.

> 
> 3. Why has no one CC'd the driver maintainers nor individuals who have
> committed/touched this driver?  Those people are:
> 
> Jung-uk KIM <jkim@FreeBSD.org>
> Rui Paulo <rpaulo@FreeBSD.org>
> Norikatsu SHIGEMURA <nork@FreeBSD.org>
> 

I don't know. But since you evidently didn't either, I did.


-- 
Jim Ohlstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5144A423.2060007>