From owner-freebsd-stable@FreeBSD.ORG Sat Mar 16 16:56:12 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CE0AAB8 for ; Sat, 16 Mar 2013 16:56:12 +0000 (UTC) (envelope-from jim@ohlste.in) Received: from mail-ve0-f176.google.com (mail-ve0-f176.google.com [209.85.128.176]) by mx1.freebsd.org (Postfix) with ESMTP id 8CFFF843 for ; Sat, 16 Mar 2013 16:56:12 +0000 (UTC) Received: by mail-ve0-f176.google.com with SMTP id cz10so3337680veb.7 for ; Sat, 16 Mar 2013 09:56:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding:x-gm-message-state; bh=qODAlCWlE9pyU65em+siBUQLe7U82NBpx7bM6qr7IXY=; b=FA2NXRGLuoEZgN+7sKGZrq4joGwSTRtYdUwlV6GhzcJ6l6zNPbb6O906fG3b/6fD2O gxhxCZg3w0E/QZbfxLKb2POs7uP7r66yynKeZ6R0aJmRR/Zg4Dg9l3nVdw/HsOPSYFZG Oeo7DgJeQ3cJ5Nf3ERxvYwUmgVvShi2IeH+4JtFZhal8uW/h82L/oC6h7ZIbY2mTH2FZ hIw3iR0gqPmiw8U2NbBBB2qUe75NwXMTSw6jF2gXU50QNQ9Nb6m+2IvM74al8pjbbEfz FBtHMkIGnIR8KPwwW0BUHK9GoP0xp4KS2BBtWnAS7Hc4bsKEtlrOqr8MmpEi6/RNpEfV Q37A== X-Received: by 10.58.151.4 with SMTP id um4mr12702212veb.12.1363452966014; Sat, 16 Mar 2013 09:56:06 -0700 (PDT) Received: from jims-iMac.local (pool-74-98-165-9.nrflva.fios.verizon.net. [74.98.165.9]) by mx.google.com with ESMTPS id b9sm10954928vee.3.2013.03.16.09.56.04 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 16 Mar 2013 09:56:05 -0700 (PDT) Message-ID: <5144A423.2060007@ohlste.in> Date: Sat, 16 Mar 2013 12:56:03 -0400 From: Jim Ohlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: amdtemp does not find my CPU. References: <20130315161508.GA963@mycenae.sbb.rs> <51437383.2080705@ohlste.in> <20130316062013.GA35674@icarus.home.lan> In-Reply-To: <20130316062013.GA35674@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQlMyvdXUQ6wSBV3zmLRT+WDGpCFy1OSiKLzOhsB/LN/lPcXNB1mSt1hKvnzJduKYI4OX7el Cc: Jeremy Chadwick , Rui Paulo , Norikatsu SHIGEMURA , Jung-uk KIM , Zoran Kolic X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Mar 2013 16:56:12 -0000 On 3/16/13 2:20 AM, Jeremy Chadwick wrote: > On Fri, Mar 15, 2013 at 03:16:19PM -0400, Jim Ohlstein wrote: >> On 3/15/13 12:15 PM, Zoran Kolic wrote: >>> After I installed 9.1 amd64 on node with amd 8120, >>> I was not able to read temperatures out of the box. >>> I fetched source for head module and compiled. And >>> loaded module. Still nothing. I assume my cpu is >>> a bit different. >>> Best regards >> >> The module from head "works" for me with an 8120 on 9.1 stable (r247893) >> though the results are inconsistent. I am not certain of how useful they >> are. >> >> # sysctl hw.model >> hw.model: AMD FX(tm)-8120 Eight-Core Processor >> >> # kldstat | grep amd >> 5 1 0xffffffff8183e000 1043 amdtemp.ko >> >> # sysctl -a | grep dev.amdtemp >> dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors >> dev.amdtemp.0.%driver: amdtemp >> dev.amdtemp.0.%parent: hostb4 >> dev.amdtemp.0.sensor_offset: 0 >> dev.amdtemp.0.core0.sensor0: 47.7C >> >> Here are results taken at 0.1 second intervals using a shell script: >> >> dev.amdtemp.0.core0.sensor0: 42.1C >> dev.amdtemp.0.core0.sensor0: 42.2C >> dev.amdtemp.0.core0.sensor0: 42.0C >> dev.amdtemp.0.core0.sensor0: 42.1C >> dev.amdtemp.0.core0.sensor0: 41.8C >> dev.amdtemp.0.core0.sensor0: 41.7C >> dev.amdtemp.0.core0.sensor0: 51.1C >> dev.amdtemp.0.core0.sensor0: 51.0C >> dev.amdtemp.0.core0.sensor0: 50.7C >> dev.amdtemp.0.core0.sensor0: 50.5C >> dev.amdtemp.0.core0.sensor0: 50.1C >> dev.amdtemp.0.core0.sensor0: 49.8C >> dev.amdtemp.0.core0.sensor0: 49.5C >> dev.amdtemp.0.core0.sensor0: 49.2C >> dev.amdtemp.0.core0.sensor0: 49.2C >> >> >> and again: >> >> dev.amdtemp.0.core0.sensor0: 41.5C >> dev.amdtemp.0.core0.sensor0: 41.2C >> dev.amdtemp.0.core0.sensor0: 40.8C >> dev.amdtemp.0.core0.sensor0: 40.8C >> dev.amdtemp.0.core0.sensor0: 41.0C >> dev.amdtemp.0.core0.sensor0: 41.3C >> dev.amdtemp.0.core0.sensor0: 41.6C >> dev.amdtemp.0.core0.sensor0: 41.3C >> dev.amdtemp.0.core0.sensor0: 54.0C >> dev.amdtemp.0.core0.sensor0: 53.7C >> dev.amdtemp.0.core0.sensor0: 53.3C >> dev.amdtemp.0.core0.sensor0: 53.1C >> dev.amdtemp.0.core0.sensor0: 52.7C >> dev.amdtemp.0.core0.sensor0: 52.3C >> dev.amdtemp.0.core0.sensor0: 52.1C >> dev.amdtemp.0.core0.sensor0: 51.7C >> dev.amdtemp.0.core0.sensor0: 51.5C >> >> You can see during each series there are sudden increases of over 9C and >> almost 13C respectively. >> >> The same effect is seen if I track any of the individual cores with >> "dev.cpu.[0-7].temperature". Here's an example with a 9C jump in 0.1 second. >> >> dev.cpu.3.temperature: 41.5C >> dev.cpu.3.temperature: 41.5C >> dev.cpu.3.temperature: 41.7C >> dev.cpu.3.temperature: 41.7C >> dev.cpu.3.temperature: 41.3C >> dev.cpu.3.temperature: 41.0C >> dev.cpu.3.temperature: 40.7C >> dev.cpu.3.temperature: 49.8C >> dev.cpu.3.temperature: 49.5C >> dev.cpu.3.temperature: 49.2C >> dev.cpu.3.temperature: 48.8C >> dev.cpu.3.temperature: 48.6C >> dev.cpu.3.temperature: 48.2C >> dev.cpu.3.temperature: 48.0C >> >> I don't have hands on access to this box as it's in a datacenter 1000 >> miles from me, but the techs there had a look and all "seems to be OK". > > 1. While it's certainly possible the DTS reading routines and/or the > calculation formulas may be wrong in amdtemp(4), possibly for your model > of CPU, it is also certainly possible that what you're seeing is normal > and fully justified. This is especially the case for the > dev.cpu.X.temperature nodes on the K8 family. > > Respectfully, not combatively nor dismissively: you've not provided a > comparison base to prove there's an issue. You would need to provide > data from Linux (I forget what daemon/tool they have to get this) or > Windows (Core Temp). Respectfully, not combatively nor dismissively: I hadn't attempted to "prove" anything. I said: "I am not certain of how useful they [the readings] are.". I had merely provided some observational data as an aside to the fact that yes, indeed, the module provides readings for me on the 8120 This was in direct response to to Zoran's issue with this module and that processor model. This started, for me, when I looked at a graph of average core temperatures taken at 30 second intervals on two different machines using Zabbix. The fluctuations were visibly (I know that's not scientific "proof") more wild than on this server than on another using the amdtemp module from 9 stable. I don't have access to another server with this model CPU on any other OS, or even on this OS, so I cannot provide the data to "prove" this is an issue according to your criteria. However, I will provide comparative data from the other machine with the module from stable and with the the module from head. Full data taken now: # sysctl hw.model hw.model: AMD FX(tm)-8120 Eight-Core Processor Using the module from head: http://pastebin.com/wqQ0FLq3 Note the big change between lines 34 and 35. # sysctl hw.model hw.model: AMD Phenom(tm) II X6 1055T Processor Using the module from stable: http://pastebin.com/2jzEWZxf Using the module from head: http://pastebin.com/RXsbvM20 These data, which I reproduced multiple times, would *seem* to suggest that the variation seen is perhaps related to the hardware in the first box as the other box produces consistent results with both versions of the kernel module. This could be, among other things, the processor itself (inherent to this processor line, or unique to my copy) or, I suppose, some other hardware issue such as an improperly installed heat sink. I am interested to see if anyone else has seen these type of results with the 8120. However, another interesting point came up, and I do believe this to be an "issue". As can be seen, in the second box, the data are consistent over a 10 second period in each run, but the data from the two modules differ by about 10-12C. I did many runs back and forth with the two modules with only a short delay, and this was a consistent finding. The module from stable always produced substantially higher readings than the module from head. The posted data from that box are from about half a minute apart. > > 2. I have a gut feeling I know what may be causing what you're seeing, > but I need you to provide verbatim the shell script you're using. I said it was simple. Here it is, only modified to add a date stamp: #!/bin/sh echo `date` for i in `seq 1 $3` do $1 ; sleep $2 ; done The exact command used for each run reported here: # ./repeat.sh "sysctl dev.cpu.3.temperature" 0.1 100 The results for "dev.amdtemp.0.core0.sensor0" were equivalent to the data provided, although the OID was renamed from "dev.amdtemp.0.sensor0.core0" to "dev.amdtemp.0.core0.sensor0" between the two versions of the module. To improve readability, I did not provide all lines of data in the previous email, but all data from fresh runs are included in the linked results. > > 3. Why has no one CC'd the driver maintainers nor individuals who have > committed/touched this driver? Those people are: > > Jung-uk KIM > Rui Paulo > Norikatsu SHIGEMURA > I don't know. But since you evidently didn't either, I did. -- Jim Ohlstein