From owner-freebsd-current@FreeBSD.ORG Thu Jun 10 04:28:26 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9F25B16A4CE; Thu, 10 Jun 2004 04:28:26 +0000 (GMT) Received: from mail5.tpgi.com.au (mail.tpgi.com.au [203.12.160.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id B656143D2D; Thu, 10 Jun 2004 04:28:23 +0000 (GMT) (envelope-from agh@tpg.com.au) Received: from [192.168.0.4] (220-244-72-6.tpgi.com.au [220.244.72.6]) by mail5.tpgi.com.au (8.12.10/8.12.10) with ESMTP id i5A4SIXL004296; Thu, 10 Jun 2004 14:28:19 +1000 From: "Alastair G. Hogge" To: freebsd-current@FreeBSD.ORG Date: Thu, 10 Jun 2004 14:28:48 +1000 User-Agent: KMail/1.6.2 References: <200406061940.15400.agh@tpg.com.au> <20040608055624.GC59752@afields.ca> In-Reply-To: <20040608055624.GC59752@afields.ca> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200406101428.49101.agh@tpg.com.au> X-TPG-Antivirus: Passed cc: Allan Fields cc: current@FreeBSD.ORG Subject: Re: Custom kernels causing Promise ATA RAID to go down X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2004 04:28:26 -0000 On Tuesday 08 June 2004 15:56, Allan Fields wrote: > On Sun, Jun 06, 2004 at 07:40:15PM +1000, Alastair G. Hogge wrote: > > For a couple of weeks now I've been having problems with my custom kernel > > crashing the system. I've re-cvsup'd and nuked /usr/obj and rebuild > > worlds > > > > The problem is that my kernel keeps causing ATA DMA READ/WRITE > > errors and then eventually causing my RAID array to go down, thus > > needing a deletation and re-definition thru the BIOS. Plus uncountable > > fsck run thru. > > Yup, it sucks.. basically if your RAID goes bad, with most Promise > controllers you need to reboot into BIOS and wait a long time for > it to rebuild. I found the Promise BIOS a little lacking. I'm not > a fan of oblique menu-based tools, especially when working w/ disks. > > Online rebuild is available on some ATA controllers but can also be > slow. > > > I don't know how to capture and store the output. As the system just > > basicly hangs and freezes the keyboard. Most of the time I've been X, > > which can only be solved with a hard reboot. > > Also, just curious, but are you swapping off the RAID? Well not user if there's any swapping going on. I have 1024M of system memory, and the swap partition is located on the array. > If your RAID has read/write errors and you use it for swap, it is > likely that it will cause the system to lock, possibly including > the console. > > Do you have a second machine to use as a serial console? Unfortunately not. I'm working on getting one setup thou. > Another thing to try: try pinging the host and see if it responds. Yes I can still ping the machine. > I use a null-modem cable and tip(1): When I was having problems w/ > my Promise controller, I'd typically capture the output using > script(1) or screen(1). Ahhh very handy. Thanks :-) > > Running a GENERIC kernel is (with debuging things removed) is so slow. > > X/KDE performs so poorly now. > > What's interesting is why this only happens w/ your custom kernels. Actually, I think a GENERIC kernel just last longer then a custom. I left a GENERIC running for 6+ hours the other day while I went out, when I came back the system had locked up. > I've also experienced instability with Promise RAID controllers in > the past but didn't ever use a GENERIC kernel. I'm interested in > this issue, but don't know if it's related. > > Also: Perhaps your Promise controller or drives are overheating? Thought about this. But I don't think it is the case. I've had the 2 HD for sometime now, and I they used to 24/7. I have 3 fans running in my tower case. I've just re-built world again recently and I'm still getting problems. I need to get that other machine going.