Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Mar 2006 13:58:55 -0500
From:      "Grant Peel" <gpeel@thenetnow.com>
To:        <freebsd-questions@freebsd.org>, "Derek Ragona" <derek@computinginnovations.com>
Subject:   Re: More Server Crash Saga
Message-ID:  <002801c649f4$db4d4540$6501a8c0@GRANT>
References:  <005501c64943$37f4e2b0$6501a8c0@GRANT> <6.0.0.22.2.20060316164205.028bae80@mail.computinginnovations.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Derek,

I got this data using ipmitool from the servers BMC just after (about 3 =
minutes after robbot) a crash this afternoon.

I will be heading to th NOC this afternoone to copy the harddrive to =
another machine I have been using for about a year and a half.

Anyways, here is the sensor data ....

Temp             | 38 degrees C      | ok
Temp             | 50 degrees C      | ok
Ambient Temp     | 30 degrees C      | ok
Planar Temp      | 35 degrees C      | ok
Riser Temp       | 34 degrees C      | ok
Temp             | 40 degrees C      | ok
Temp             | 40 degrees C      | ok
CMOS Battery     | 3.15 Volts        | ok
ROMB Battery     | Not Readable      | ns
VCORE            | 0x01              | ok
VCORE            | Not Readable      | ns
PROC VTT         | 0x01              | ok
1.5V PG          | 0x01              | ok
1.8V PG          | 0x01              | ok
3.3V PG          | 0x01              | ok
5V PG            | 0x01              | ok
5V Riser PG      | 0x01              | ok
Riser PG         | 0x01              | ok
PFault Fail Safe | Not Readable      | ns
Presence         | 0x01              | ok
Presence         | 0x02              | ok
Presence         | 0x01              | ok
Presence         | 0x02              | ok
ROMB Presence    | 0x02              | ok
FAN 1A RPM       | 9600 RPM          | ok
FAN 1B RPM       | 6900 RPM          | ok
FAN 2A RPM       | 9900 RPM          | ok
FAN 2B RPM       | 6825 RPM          | ok
FAN 3A RPM       | 9825 RPM          | ok
FAN 3B RPM       | 6825 RPM          | ok
FAN 4A RPM       | 10200 RPM         | ok
FAN 4B RPM       | 6675 RPM          | ok
Status           | 0x80              | ok
Status           | Not Readable      | ns
Status           | 0x01              | ok
Status           | Not Readable      | ns
VRM              | 0x01              | ok
VRM              | 0x01              | ok
OS Watchdog      | 0x00              | ok
SEL              | Not Readable      | ns
Intrusion        | 0x00              | ok
PS Redundancy    | Not Readable      | ns
Fan Redundancy   | 0x01              | ok
SCSI Connector A | Not Readable      | ns
Drive            | 0xc0              | ok
ECC Corr Err     | 0xc0              | ok
ECC Uncorr Err   | Not Readable      | ns
I/O Channel Chk  | 0xc0              | ok
PCI Parity Err   | 0xc0              | ok
PCI System Err   | 0xc0              | ok
SBE Log Disabled | Not Readable      | ns
Logging Disabled | Not Readable      | ns
Unknown          | Not Readable      | ns
PROC Protocol    | Not Readable      | ns
PROC Bus PERR    | Not Readable      | ns
PROC Init Err    | Not Readable      | ns
PROC Machine Chk | Not Readable      | ns
Memory Spared    | Not Readable      | ns
Memory Mirrored  | 0x01              | ok
Memory RAID      | Not Readable      | ns
Memory Added     | 0x01              | ok
Memory Removed   | 0x01              | ok
PCIE Fatal Err   | 0x01              | ok
Chipset Err      | 0x01              | ok
Err Reg Pointer  | 0x01              | ok
root on s1#
  ----- Original Message -----=20
  From: Derek Ragona=20
  To: Grant Peel ; freebsd-questions@freebsd.org=20
  Sent: Thursday, March 16, 2006 5:45 PM
  Subject: Re: More Server Crash Saga


  Grant,

  That is a one unit rack mount server, which makes it prone to have =
heat problems, particularly under any load.  You might want to check the =
ambient heat and the internal heat sensors as well.

  That server uses an intel chipset (and probably an intel motherboard) =
which should allow "out-of-band" monitoring.  You should see what you =
can use to monitor the system and see what the system is reporting prior =
to a lockup.

  It may be time to just call dell and have them send a replacement MB =
or entire unit.

          -Derek


  At 03:47 PM 3/16/2006, Grant Peel wrote:

    Hi all,

    Still getting crashing today ... FreeBSD 6.0 PE 1850

    Does the output of vmstat -i for fove seconds show a problem? =
Interupt storm?

    I have been searching, trying to find out what the 'rate' means and =
what should it be?

    interrupt                          total       rate
    irq0: clk                        3277223        999
    irq5: em1                           8877          2
    irq6: ehci0 atapci0                   85          0
    irq7: mpt0 uhci2                   56401         17
    irq8: rtc                         419429        127
    irq11: em0 uhci0                   85684         26
    irq13: npx0                            1          0
    irq14: ata0                           48          0
    Total                            3847748       1173
    root on s1# vmstat -i
    interrupt                          total       rate
    irq0: clk                        3278793        999
    irq5: em1                           8883          2
    irq6: ehci0 atapci0                   85          0
    irq7: mpt0 uhci2                   56408         17
    irq8: rtc                         419630        127
    irq11: em0 uhci0                   85752         26
    irq13: npx0                            1          0
    irq14: ata0                           48          0
    Total                            3849600       1174
    root on s1# vmstat -i
    interrupt                          total       rate
    irq0: clk                        3280691        999
    irq5: em1                           8889          2
    irq6: ehci0 atapci0                   85          0
    irq7: mpt0 uhci2                   56408         17
    irq8: rtc                         419873        127
    irq11: em0 uhci0                   85843         26
    irq13: npx0                            1          0
    irq14: ata0                           48          0
    Total                            3851838       1173
    root on s1# vmstat -i
    interrupt                          total       rate
    irq0: clk                        3282850        999
    irq5: em1                           8891          2
    irq6: ehci0 atapci0                   85          0
    irq7: mpt0 uhci2                   56408         17
    irq8: rtc                         420149        127
    irq11: em0 uhci0                   86153         26
    irq13: npx0                            1          0
    irq14: ata0                           48          0
    Total                            3854585       1174=20

    _______________________________________________
    freebsd-questions@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-questions
    To unsubscribe, send any mail to =
"freebsd-questions-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?002801c649f4$db4d4540$6501a8c0>