Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Mar 2004 14:56:29 -0700
From:      Scott Long <scottl@freebsd.org>
To:        Don Bowman <don@sandvine.com>
Cc:        'Kris Kennaway' <kris@obsecurity.org>
Subject:   Re: LOR on current
Message-ID:  <4062040D.70606@freebsd.org>
In-Reply-To: <FE045D4D9F7AED4CBFF1B3B813C85337045D86F8@mail.sandvine.com>
References:  <FE045D4D9F7AED4CBFF1B3B813C85337045D86F8@mail.sandvine.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Don Bowman wrote:
> From: Kris Kennaway [mailto:kris@obsecurity.org]
> 
>>On Wed, Mar 24, 2004 at 03:23:36PM -0500, Don Bowman wrote:
>>
>>
>>>>Right, I think that's not the cause of your lockup :)
>>>
>>>Not being one to believe in coincidences... I'm typing 
>>>on the serial console. The machine halts, i can no longer type.
>>>some seconds pass, out pops that message. This time too it
>>>returned. Most times (when i run two postgresql vacuums 
>>
>>simulatenously
>>
>>>for example), that's the end of it.
>>>
>>>I will continue to investigate.
>>
>>Check for disk problems..I have often experienced hangs or lockups on
>>machines with faulty disks.
> 
> 
> 6-disk raid 5 behind ASR. All disks report optimal, controller
> reports optimal. I know the hangs you mean, from the vm
> swapin etc which holds all the locks. I don't think this
> is they.
> 
> with ahd i would get scsi sense errors in the log for machines
> with problems [CRC errors etc], i don't have a for what asr does 
> in this case.
> 
> ran a 96 hour memory test (memtest86), with ecc checking, there
> were no soft or hard errors. Ran machine to 40 degrees C ambient
> in environmental chamber, its all good. Its got 3 power supplies,
> all are operational, fed from UPS.
> This is a software problem somewhere I think.
> 
> I'm curious, how many people use ASR with current? It seems
> like it might be somewhat unloved.
> 

It is unloved.  Adaptec provides no official support for it, and I
have many more things that are a higher priority.  I'm not against
working on it, but it's hard to justify it at the moment.  Anyways,
it wouldn't surprise me if the controller or driver was going out to
lunch and stalling the VM, but we probably need to do a lot more
investigation to support that.  I assume that you have both WITNESS and
INVARIANTS turned on?

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4062040D.70606>