From owner-freebsd-questions@FreeBSD.ORG Fri Mar 17 18:59:06 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DF11916A41F for ; Fri, 17 Mar 2006 18:59:06 +0000 (UTC) (envelope-from gpeel@thenetnow.com) Received: from webmaillogin.com (fr3.webmaillogin.com [216.40.35.67]) by mx1.FreeBSD.org (Postfix) with ESMTP id E537443D4C for ; Fri, 17 Mar 2006 18:59:05 +0000 (GMT) (envelope-from gpeel@thenetnow.com) Received: from [216.240.12.2] (account gpeel@thenetnow.com HELO GRANT) by fr3.webmaillogin.com (CommuniGate Pro SMTP 4.3.8) with ESMTPA id 157735967; Fri, 17 Mar 2006 13:59:04 -0500 Message-ID: <002801c649f4$db4d4540$6501a8c0@GRANT> From: "Grant Peel" To: , "Derek Ragona" References: <005501c64943$37f4e2b0$6501a8c0@GRANT> <6.0.0.22.2.20060316164205.028bae80@mail.computinginnovations.com> Date: Fri, 17 Mar 2006 13:58:55 -0500 Organization: The Net Now Internet MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: More Server Crash Saga X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Grant Peel List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Mar 2006 18:59:07 -0000 Hi Derek, I got this data using ipmitool from the servers BMC just after (about 3 = minutes after robbot) a crash this afternoon. I will be heading to th NOC this afternoone to copy the harddrive to = another machine I have been using for about a year and a half. Anyways, here is the sensor data .... Temp | 38 degrees C | ok Temp | 50 degrees C | ok Ambient Temp | 30 degrees C | ok Planar Temp | 35 degrees C | ok Riser Temp | 34 degrees C | ok Temp | 40 degrees C | ok Temp | 40 degrees C | ok CMOS Battery | 3.15 Volts | ok ROMB Battery | Not Readable | ns VCORE | 0x01 | ok VCORE | Not Readable | ns PROC VTT | 0x01 | ok 1.5V PG | 0x01 | ok 1.8V PG | 0x01 | ok 3.3V PG | 0x01 | ok 5V PG | 0x01 | ok 5V Riser PG | 0x01 | ok Riser PG | 0x01 | ok PFault Fail Safe | Not Readable | ns Presence | 0x01 | ok Presence | 0x02 | ok Presence | 0x01 | ok Presence | 0x02 | ok ROMB Presence | 0x02 | ok FAN 1A RPM | 9600 RPM | ok FAN 1B RPM | 6900 RPM | ok FAN 2A RPM | 9900 RPM | ok FAN 2B RPM | 6825 RPM | ok FAN 3A RPM | 9825 RPM | ok FAN 3B RPM | 6825 RPM | ok FAN 4A RPM | 10200 RPM | ok FAN 4B RPM | 6675 RPM | ok Status | 0x80 | ok Status | Not Readable | ns Status | 0x01 | ok Status | Not Readable | ns VRM | 0x01 | ok VRM | 0x01 | ok OS Watchdog | 0x00 | ok SEL | Not Readable | ns Intrusion | 0x00 | ok PS Redundancy | Not Readable | ns Fan Redundancy | 0x01 | ok SCSI Connector A | Not Readable | ns Drive | 0xc0 | ok ECC Corr Err | 0xc0 | ok ECC Uncorr Err | Not Readable | ns I/O Channel Chk | 0xc0 | ok PCI Parity Err | 0xc0 | ok PCI System Err | 0xc0 | ok SBE Log Disabled | Not Readable | ns Logging Disabled | Not Readable | ns Unknown | Not Readable | ns PROC Protocol | Not Readable | ns PROC Bus PERR | Not Readable | ns PROC Init Err | Not Readable | ns PROC Machine Chk | Not Readable | ns Memory Spared | Not Readable | ns Memory Mirrored | 0x01 | ok Memory RAID | Not Readable | ns Memory Added | 0x01 | ok Memory Removed | 0x01 | ok PCIE Fatal Err | 0x01 | ok Chipset Err | 0x01 | ok Err Reg Pointer | 0x01 | ok root on s1# ----- Original Message -----=20 From: Derek Ragona=20 To: Grant Peel ; freebsd-questions@freebsd.org=20 Sent: Thursday, March 16, 2006 5:45 PM Subject: Re: More Server Crash Saga Grant, That is a one unit rack mount server, which makes it prone to have = heat problems, particularly under any load. You might want to check the = ambient heat and the internal heat sensors as well. That server uses an intel chipset (and probably an intel motherboard) = which should allow "out-of-band" monitoring. You should see what you = can use to monitor the system and see what the system is reporting prior = to a lockup. It may be time to just call dell and have them send a replacement MB = or entire unit. -Derek At 03:47 PM 3/16/2006, Grant Peel wrote: Hi all, Still getting crashing today ... FreeBSD 6.0 PE 1850 Does the output of vmstat -i for fove seconds show a problem? = Interupt storm? I have been searching, trying to find out what the 'rate' means and = what should it be? interrupt total rate irq0: clk 3277223 999 irq5: em1 8877 2 irq6: ehci0 atapci0 85 0 irq7: mpt0 uhci2 56401 17 irq8: rtc 419429 127 irq11: em0 uhci0 85684 26 irq13: npx0 1 0 irq14: ata0 48 0 Total 3847748 1173 root on s1# vmstat -i interrupt total rate irq0: clk 3278793 999 irq5: em1 8883 2 irq6: ehci0 atapci0 85 0 irq7: mpt0 uhci2 56408 17 irq8: rtc 419630 127 irq11: em0 uhci0 85752 26 irq13: npx0 1 0 irq14: ata0 48 0 Total 3849600 1174 root on s1# vmstat -i interrupt total rate irq0: clk 3280691 999 irq5: em1 8889 2 irq6: ehci0 atapci0 85 0 irq7: mpt0 uhci2 56408 17 irq8: rtc 419873 127 irq11: em0 uhci0 85843 26 irq13: npx0 1 0 irq14: ata0 48 0 Total 3851838 1173 root on s1# vmstat -i interrupt total rate irq0: clk 3282850 999 irq5: em1 8891 2 irq6: ehci0 atapci0 85 0 irq7: mpt0 uhci2 56408 17 irq8: rtc 420149 127 irq11: em0 uhci0 86153 26 irq13: npx0 1 0 irq14: ata0 48 0 Total 3854585 1174=20 _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to = "freebsd-questions-unsubscribe@freebsd.org"