From owner-freebsd-ppc@FreeBSD.ORG  Thu Jan 17 20:59:30 2013
Return-Path: <owner-freebsd-ppc@FreeBSD.ORG>
Delivered-To: freebsd-ppc@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id CDDA3FDF
 for <freebsd-ppc@freebsd.org>; Thu, 17 Jan 2013 20:59:30 +0000 (UTC)
 (envelope-from mrezny@hexaneinc.com)
Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net
 [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 9347EBA
 for <freebsd-ppc@freebsd.org>; Thu, 17 Jan 2013 20:59:30 +0000 (UTC)
X-Originating-IP: 10.0.10.73
Received: from localhost (front3-v.mgt.gandi.net [10.0.10.73])
 by relay3-d.mail.gandi.net (Postfix) with ESMTP id E2456A80B4
 for <freebsd-ppc@freebsd.org>; Thu, 17 Jan 2013 21:59:12 +0100 (CET)
MIME-Version: 1.0
X-Mailer: Webmail
Message-ID: <7700.1358456352@hexaneinc.com>
To: <freebsd-ppc@freebsd.org>
Content-Type: text/plain; charset="utf-8"
X-Origin: 81.90.254.28
Date: Thu, 17 Jan 2013 21:59:12 +0100
Subject: PowerMac G5 spurious sensor readings
From: Matthew Rezny <mrezny@hexaneinc.com>
Content-Transfer-Encoding: quoted-printable
X-BeenThere: freebsd-ppc@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Matthew Rezny <mrezny@hexaneinc.com>
List-Id: Porting FreeBSD to the PowerPC <freebsd-ppc.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-ppc>,
 <mailto:freebsd-ppc-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-ppc>
List-Post: <mailto:freebsd-ppc@freebsd.org>
List-Help: <mailto:freebsd-ppc-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-ppc>,
 <mailto:freebsd-ppc-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2013 20:59:30 -0000

I have a G5 of the first model (PowerMac7,2) on which I've been using FreeB=
SD/ppc64 for over a year. Today, it suddenly rebooted. Not the first time b=
y any means, but this is the first time I found the following log message:
Jan 17 17:32:19 powermac kernel: WARNING: Current temperature (MLB MAX6690 =
AMB:127.8 C) exceeds critical temperature (80.0 C)! Shutting down!

This is the first time I have seen such a message. After reboot, that senso=
r shows a temperature near 30C, which seems appropriate. The reading of 127=
.8C looks suspiciously like a max value. My only guess is there was a bad r=
ead that resulted in=20
the sensor value going over the threshold. That raises a question in my min=
d as to whether there is any filtering or sanity checking of the data. Coul=
d a single bad read cause the threshold to be exceeded and trigger shutdown=
 immediately, or would=20
the excessive value have to be returned from that sensor multiple times for=
 it to be believed an acted upon?

$ uname -a
FreeBSD powermac 9.1-RC1 FreeBSD 9.1-RC1 #0: Thu Aug 16 00:43:39 UTC 2012  =
   root@anacreon.physics.wisc.edu:/usr/obj/usr/src/sys/GENERIC64  powerpc

The build is a bit old, though I wouldn't expect too much change to the cod=
e in question since then. I will update to 9.1-RELEASE or -STABLE in the ne=
xt few days, but as this is a problem that has happened once in over a year=
, I wouldn't call it=20
resolved just by a quick failure to reproduce after updating.

I was already planning to do an update after the box has completed it's cur=
rent task. I noticed a problem with excessive output causing the console to=
 hang. A couple days ago I found the machine apparently hung in that the ke=
yboard and mouse were=20
not responsive, but I found it was still alive on the network and I could s=
sh in to reboot. The only clues were no buffer space for dmesg to output an=
ything before reboot, and a rather full /var/log/messages file which had ex=
hausted the drive.=20
Under the same workload (and after freeing some drive space), the problem r=
eoccurred in a matter of hours, but this time with me watching. While runni=
ng ddrescue against a drive with some bad sectors, read errors flood the co=
nsole in spurts. When=20
some dozens of read errors are displayed at once, the console scrolls whole=
 pages by in a fraction of a second, and then goes dead. Messages that shou=
ld go to console are not shown on screen but are in the log. Attempts to sw=
itch virtual console or=20
to reboot are not successful, but ssh access continues to work and the box =
is clearly still processing other workloads. The only sign of life from the=
 console are the messages about flushing buffers just before completion of =
the reboot commanded=20
via ssh.