From owner-freebsd-current@FreeBSD.ORG Wed Mar 24 14:00:13 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0984116A4CE for ; Wed, 24 Mar 2004 14:00:13 -0800 (PST) Received: from smtp.mho.com (smtp.mho.net [64.58.4.5]) by mx1.FreeBSD.org (Postfix) with SMTP id D2CF543D2F for ; Wed, 24 Mar 2004 14:00:12 -0800 (PST) (envelope-from scottl@freebsd.org) Received: (qmail 58351 invoked by uid 1002); 24 Mar 2004 22:00:10 -0000 Received: from unknown (HELO freebsd.org) (64.58.1.252) by smtp.mho.net with SMTP; 24 Mar 2004 22:00:10 -0000 Message-ID: <4062040D.70606@freebsd.org> Date: Wed, 24 Mar 2004 14:56:29 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Don Bowman References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: "'current@freebsd.org'" cc: 'Kris Kennaway' Subject: Re: LOR on current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2004 22:00:13 -0000 Don Bowman wrote: > From: Kris Kennaway [mailto:kris@obsecurity.org] > >>On Wed, Mar 24, 2004 at 03:23:36PM -0500, Don Bowman wrote: >> >> >>>>Right, I think that's not the cause of your lockup :) >>> >>>Not being one to believe in coincidences... I'm typing >>>on the serial console. The machine halts, i can no longer type. >>>some seconds pass, out pops that message. This time too it >>>returned. Most times (when i run two postgresql vacuums >> >>simulatenously >> >>>for example), that's the end of it. >>> >>>I will continue to investigate. >> >>Check for disk problems..I have often experienced hangs or lockups on >>machines with faulty disks. > > > 6-disk raid 5 behind ASR. All disks report optimal, controller > reports optimal. I know the hangs you mean, from the vm > swapin etc which holds all the locks. I don't think this > is they. > > with ahd i would get scsi sense errors in the log for machines > with problems [CRC errors etc], i don't have a for what asr does > in this case. > > ran a 96 hour memory test (memtest86), with ecc checking, there > were no soft or hard errors. Ran machine to 40 degrees C ambient > in environmental chamber, its all good. Its got 3 power supplies, > all are operational, fed from UPS. > This is a software problem somewhere I think. > > I'm curious, how many people use ASR with current? It seems > like it might be somewhat unloved. > It is unloved. Adaptec provides no official support for it, and I have many more things that are a higher priority. I'm not against working on it, but it's hard to justify it at the moment. Anyways, it wouldn't surprise me if the controller or driver was going out to lunch and stalling the VM, but we probably need to do a lot more investigation to support that. I assume that you have both WITNESS and INVARIANTS turned on? Scott