From owner-freebsd-current@FreeBSD.ORG Wed Mar 24 14:02:40 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 325E216A4D2; Wed, 24 Mar 2004 14:02:40 -0800 (PST) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id AF86043D2F; Wed, 24 Mar 2004 14:02:39 -0800 (PST) (envelope-from don@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2657.72) id ; Wed, 24 Mar 2004 17:02:38 -0500 Message-ID: From: Don Bowman To: 'Scott Long' , Don Bowman Date: Wed, 24 Mar 2004 17:02:37 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2657.72) Content-Type: text/plain; charset="iso-8859-1" cc: "'current@freebsd.org'" cc: 'Kris Kennaway' Subject: RE: LOR on current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2004 22:02:40 -0000 From: Scott Long [mailto:scottl@freebsd.org] > Don Bowman wrote: > > From: Kris Kennaway [mailto:kris@obsecurity.org] > > > >>On Wed, Mar 24, 2004 at 03:23:36PM -0500, Don Bowman wrote: > >> > >> > >>>>Right, I think that's not the cause of your lockup :) > >>> > >>>Not being one to believe in coincidences... I'm typing > >>>on the serial console. The machine halts, i can no longer type. > >>>some seconds pass, out pops that message. This time too it > >>>returned. Most times (when i run two postgresql vacuums > >> > >>simulatenously > >> > >>>for example), that's the end of it. > >>> > >>>I will continue to investigate. > >> > >>Check for disk problems..I have often experienced hangs or > lockups on > >>machines with faulty disks. > > > > > > 6-disk raid 5 behind ASR. All disks report optimal, controller > > reports optimal. I know the hangs you mean, from the vm > > swapin etc which holds all the locks. I don't think this > > is they. > > > > with ahd i would get scsi sense errors in the log for machines > > with problems [CRC errors etc], i don't have a for what asr does > > in this case. > > > > ran a 96 hour memory test (memtest86), with ecc checking, there > > were no soft or hard errors. Ran machine to 40 degrees C ambient > > in environmental chamber, its all good. Its got 3 power supplies, > > all are operational, fed from UPS. > > This is a software problem somewhere I think. > > > > I'm curious, how many people use ASR with current? It seems > > like it might be somewhat unloved. > > > > It is unloved. Adaptec provides no official support for it, and I > have many more things that are a higher priority. I'm not against > working on it, but it's hard to justify it at the moment. Anyways, > it wouldn't surprise me if the controller or driver was going out to > lunch and stalling the VM, but we probably need to do a lot more > investigation to support that. I assume that you have both > WITNESS and > INVARIANTS turned on? witness and invariants are indeed on. can i switch asr to aac without reformatting my disks?