From owner-freebsd-current@FreeBSD.ORG Wed Mar 24 14:08:11 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DE04F16A4CE for ; Wed, 24 Mar 2004 14:08:11 -0800 (PST) Received: from smtp.mho.com (smtp.mho.net [64.58.4.5]) by mx1.FreeBSD.org (Postfix) with SMTP id A805E43D2F for ; Wed, 24 Mar 2004 14:08:11 -0800 (PST) (envelope-from scottl@freebsd.org) Received: (qmail 59812 invoked by uid 1002); 24 Mar 2004 22:08:11 -0000 Received: from unknown (HELO freebsd.org) (64.58.1.252) by smtp.mho.net with SMTP; 24 Mar 2004 22:08:11 -0000 Message-ID: <406205EE.8050506@freebsd.org> Date: Wed, 24 Mar 2004 15:04:30 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Don Bowman References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: "'current@freebsd.org'" cc: 'Kris Kennaway' Subject: Re: LOR on current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2004 22:08:12 -0000 Don Bowman wrote: > From: Scott Long [mailto:scottl@freebsd.org] > >>Don Bowman wrote: >> >>>From: Kris Kennaway [mailto:kris@obsecurity.org] >>> >>> >>>>On Wed, Mar 24, 2004 at 03:23:36PM -0500, Don Bowman wrote: >>>> >>>> >>>> >>>>>>Right, I think that's not the cause of your lockup :) >>>>> >>>>>Not being one to believe in coincidences... I'm typing >>>>>on the serial console. The machine halts, i can no longer type. >>>>>some seconds pass, out pops that message. This time too it >>>>>returned. Most times (when i run two postgresql vacuums >>>> >>>>simulatenously >>>> >>>> >>>>>for example), that's the end of it. >>>>> >>>>>I will continue to investigate. >>>> >>>>Check for disk problems..I have often experienced hangs or >> >>lockups on >> >>>>machines with faulty disks. >>> >>> >>>6-disk raid 5 behind ASR. All disks report optimal, controller >>>reports optimal. I know the hangs you mean, from the vm >>>swapin etc which holds all the locks. I don't think this >>>is they. >>> >>>with ahd i would get scsi sense errors in the log for machines >>>with problems [CRC errors etc], i don't have a for what asr does >>>in this case. >>> >>>ran a 96 hour memory test (memtest86), with ecc checking, there >>>were no soft or hard errors. Ran machine to 40 degrees C ambient >>>in environmental chamber, its all good. Its got 3 power supplies, >>>all are operational, fed from UPS. >>>This is a software problem somewhere I think. >>> >>>I'm curious, how many people use ASR with current? It seems >>>like it might be somewhat unloved. >>> >> >>It is unloved. Adaptec provides no official support for it, and I >>have many more things that are a higher priority. I'm not against >>working on it, but it's hard to justify it at the moment. Anyways, >>it wouldn't surprise me if the controller or driver was going out to >>lunch and stalling the VM, but we probably need to do a lot more >>investigation to support that. I assume that you have both >>WITNESS and >>INVARIANTS turned on? > > > witness and invariants are indeed on. > can i switch asr to aac without reformatting my disks? > > > I beleive so, but I'll have to get back to you on the details. There might be some gotchas. Scott