From owner-freebsd-scsi  Mon Mar  6  9:41:12 2000
Delivered-To: freebsd-scsi@freebsd.org
Received: from darius.concentric.net (darius.concentric.net [207.155.198.79])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5241037BE13; Mon,  6 Mar 2000 09:41:05 -0800 (PST)
	(envelope-from aronchick@archegenesis.com)
Received: from mcfeely.concentric.net (mcfeely.concentric.net [207.155.198.83])
	by darius.concentric.net (8.9.1a/(98/12/15 5.12))
	id MAA29057; Mon, 6 Mar 2000 12:41:04 -0500 (EST)
	[1-800-745-2747 The Concentric Network]
Received: from aronchick (w148.z208036085.nyc-ny.dsl.cnc.net [208.36.85.148])
	by mcfeely.concentric.net (8.9.1a)
	id MAA09184; Mon, 6 Mar 2000 12:41:03 -0500 (EST)
Date: Mon, 06 Mar 2000 12:40:56 -0500
From: David Aronchick <aronchick@archegenesis.com>
Reply-To: David Aronchick <aronchick@archegenesis.com>
To: Greg Lehey <grog@lemis.com>
Cc: freebsd-scsi@FreeBSD.ORG, freebsd-stable@FreeBSD.ORG
Subject: Re: Vinum vs Adaptec AIC 7890?
Message-ID: <3158684381.952346456@aronchick>
In-Reply-To: <20000306184553.A332@mojave.worldwide.lemis.com>
X-Mailer: Mulberry/2.0.0b11 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Hi again--

I wasn't sure what you meant by ARRE or AWRE in this case either, and I 
couldn't find them in the vinum docs. I don't think the kernel panicked, or 
if it did, it couldn't write any logs.  I think the controller just freaked 
out or something.  The worst part is the lack of availability of the 
machine when this happens, anything you could provide that would help me 
fix this would be very helpful.

--On Monday, March 06, 2000 18:45 +1100 Greg Lehey <grog@lemis.com> wrote:


> On Sunday,  5 March 2000 at 21:30:55 -0500, David Aronchick wrote:
>> Hi--
>>
>> I've had the following problems...
>> we're currently running with 3x18 GB 10k Segate drives and an Asus p2b-ds
>> <http://www.asus.com/products/motherboard/pentiumpro/p2b-ds/spec.html>
>> with onboard scsi card.  The drives are divided into 3 partitions /usr
>> /var / and a RAID 5 of 26 GB.
>>
>> I was able to recover by just doing vinum start on the stale drive, but
>> it brought the system down, and i need to make sure this doesn't happen
>> again. Does anyone have any suggestions? Is this a CAM or vinum problem?
>> or should I look to hardware.  Here's the standard list.
>
> Hmm.  There are a couple of things missing here, like the dump.  The
> log files show that there's something wrong with /dev/da1 (unrecovered
> data error; you should check whether you have ARRE and AWRE set), but
> that shouldn't cause the system to hang up.
>
> I'd strongly doubt that the problem has anything to do with the host
> adapter.  I don't know of anything in Vinum which would cause these
> problems either, so we'd really need to know more details.  Sorry I
> can't give you any more ideas, but there's not much to go on.
>
>> What problems are you having?  Using lftp, I was in the midst of
>> ftping a 50 MB or so file to the raid directly.  After about 5% was
>> done, the entire box froze.  As it is remote, I don't know if the
>> drives were accessing, but all open ssh sessions just stopped, i
>> could ping and nmap the box, but when I tried to initiate an ssh,
>> they would just open, and sit there.  I've previously been able to
>> copy a couple of hundred MB back and forth, with seemingly no
>> problems.  That was a few days ago.
>>
>> Which version of FreeBSD are you running?
>>
>> 3.4-STABLE
>>
>> Have you made any changes to the system sources, including Vinum?
>> No, everything is unchanged.
>>
>> Kernel stuff:
>> # vinum list
>> Configuration summary
>>
>> Drives:         3 (4 configured)
>> Volumes:        1 (4 configured)
>> Plexes:         1 (8 configured)
>> Subdisks:       3 (16 configured)
>>
>> D d0                    State: up       Device /dev/da0s2e      Avail:
>> 311/15311 MB (2%) D d1                    State: up       Device
>> /dev/da1s2e      Avail: 311/15311 MB (2%) D d2                    State:
>> up       Device /dev/da2s2e      Avail: 311/15311 MB (2%)
>>
>> V raid5                 State: up       Plexes:       1 Size:         29
>> GB
>>
>> P raid5.p0           R5 State: degraded Subdisks:     3 Size:         29
>> GB
>>
>> S raid5.p0.s0           State: up       PO:        0  B Size:         14
>> GB S raid5.p0.s1           State: stale    PO:      512 kB Size:
>> 14 GB S raid5.p0.s2           State: up       PO:     1024 kB Size:
>> 14 GB
>>
>> # tail -100 /var/log/messages
>>
>> [...]
>> Mar  5 13:25:18 db /kernel: (da1:ahc0:0:1:0): READ(10). CDB: 28 0 0 41 82
>> 4e 0 0 80 0
>> Mar  5 13:25:18 db /kernel: (da1:ahc0:0:1:0): MEDIUM ERROR info:41825b
>> asc:11,0
>> Mar  5 13:25:18 db /kernel: (da1:ahc0:0:1:0): Unrecovered read error
>> field replaceable unit: e4 sks:80,101
>> Mar  5 13:25:18 db /kernel: raid5.p0.s1: fatal read I/O error
>> Mar  5 13:25:18 db /kernel: vinum: raid5.p0.s1 is crashed by force
>> Mar  5 13:25:18 db /kernel: vinum: raid5.p0 is degraded
>> Mar  5 13:25:18 db /kernel: raid5.p0.s1: fatal write I/O error
>> Mar  5 13:25:18 db /kernel: vinum: raid5.p0.s1 is stale by force
>> Mar  5 16:52:08 db /kernel: Copyright (c) 1992-1999 FreeBSD Inc.
>> [ the machine was manually rebooted 3 hours later ]
>
> Greg
> --
> Finger grog@lemis.com for PGP public key
> See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message