From owner-freebsd-current@FreeBSD.ORG Mon May 31 11:20:56 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9B3A016A4CE for ; Mon, 31 May 2004 11:20:56 -0700 (PDT) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1770D43D2F for ; Mon, 31 May 2004 11:20:56 -0700 (PDT) (envelope-from don@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2657.72) id ; Mon, 31 May 2004 14:20:33 -0400 Message-ID: From: Don Bowman To: 'Doug White' , Don Bowman Date: Mon, 31 May 2004 14:20:32 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2657.72) Content-Type: text/plain; charset="iso-8859-1" cc: "'current@freebsd.org'" Subject: RE: hang with raid, postgresql X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 May 2004 18:20:56 -0000 From: Doug White [mailto:dwhite@gumbysoft.com] > On Sun, 30 May 2004, Don Bowman wrote: > > > From: Doug White [mailto:dwhite@gumbysoft.com] > > > On Sun, 30 May 2004, Don Bowman wrote: > > > > > > > > > > > I have a system with 2x 2.8GHz XEON (P4), intel e7501 chipset, > > > > 4GB of ram, aac [adaptec 2200s] raid with 4 scsi > > > > disks. I have also tried asr (adaptec 2015). > > > > I have tried two different motherboards. > > > > The only application the machine runs is postgresql, > > > > with about ~30 databases, about ~250GB of data. > > > > > > > > I'm finding the machine locks up solid once a day > > > > or so (sometimes more, sometimes less, no pattern > > > > of time of day). I know its not a hardware issue, it > > > > is reliable with FreeBSD 4.7. I've run through memory > > > > test, disk test, etc. > > > > > > > > There appears to be a correlation between > > > > disk activity (postgresql vacuum) and the lockup, > > > > but i can't be sure. > > > > > > Temperature? > > > > > > What motherboard is it exactly? > > > > lmmon shows the mobo temperature @ 28C. It is in > > an AC-controlled environment (~20C ambient). The system > > has 6 blower fans, ducted over the CPU's, with the > > copper heat sinks designed for the 3.2GHz XEON. > > alright so its a pretty beefy server chassis, although it > could also be an > underperforming power supply or a scsi terminator. it has 3 separate power supplies, all have been verified. Its the 3rd piece of hardware i've tried. > > > It has 3 power supplies, each with separate AC > > inlet, fed from a UPS with filtered power. > > It should have ~150% airflow redundancy, and > > ~200% power redundancy. > > This is a supermicro X5DPE motherboard. > > Do you happen to have the IPMI option board for this system? No IPMI. > > Still seems hardware-related to me, although I've found hard > hangs caused > by buggy optimization on amd64. I don't think so. I extensively tested it with freebsd 4.7, memtest86. The scsi bus was checked with a scope, and was checked with an 'ahd' controller so that we could see iuCRC errors, SCB time outs, etc (ahd is excellent @ reporting errors, much better than any other driver). Two disk tests were run (iozone as a benchmark, iotest as a test) for several days. I'm pretty sure this is a garden variety sw problem. Currently i am suspicious of the acpi... this machine hangs on boot if acpi is not enabled, so its hard to test that theory :) The hang is in setting up and enumerating pnp isa devices. I guess i could expend energy to figure that out. My next step (which i'm not looking forward to) is to try and solder the TAP connector on and hook up my emulator. I really really don't want to do that. --don