From owner-freebsd-stable  Wed Dec  1 12:51:34 1999
Delivered-To: freebsd-stable@freebsd.org
Received: from mail.rdc2.on.home.com (ha1.rdc2.on.home.com [24.9.0.15])
	by hub.freebsd.org (Postfix) with ESMTP id 462E215220
	for <stable@FreeBSD.ORG>; Wed,  1 Dec 1999 12:51:22 -0800 (PST)
	(envelope-from bullfighter@home.com)
Received: from home.com ([24.64.153.201]) by mail.rdc2.on.home.com
          (InterMail v4.01.01.07 201-229-111-110) with ESMTP
          id <19991201205057.KUEA26733.mail.rdc2.on.home.com@home.com>;
          Wed, 1 Dec 1999 12:50:57 -0800
Message-ID: <384589FD.14CCA8BB@home.com>
Date: Wed, 01 Dec 1999 15:50:05 -0500
From: M a t a d o r <bullfighter@home.com>
X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 3.3-RELEASE i386)
X-Accept-Language: en
MIME-Version: 1.0
To: "Kenneth D. Merry" <ken@kdm.org>
Cc: David Gilbert <dgilbert@velocet.ca>, stable@FreeBSD.ORG
Subject: Re: vinum experiences.
References: <199912011806.LAA43219@panzer.kdm.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > While I'm still chasing the memory corruption bug in vinum, I have a
> > couple of observations.
> >
> > 1. Removing a device (at least, with the ahc controller) locks the bus
> > even though I have a RAID hot-swap ready chassy (that properly
> > isolates the bus between commands).  In my test, I had a completely
> > quiet SCSI bus when I removed one of the drives.  I then wrote to the
> > RAID array.  I got:
> >
> > Nov 30 18:31:51 raid1 /kernel: (da8:ahc1:0:11:0): Invalidating pack
> > Nov 30 18:31:51 raid1 /kernel: raid.p0.s6: fatal read I/O error
> > Nov 30 18:31:51 raid1 /kernel: vinum: raid.p0.s6 is crashed by force
> > Nov 30 18:31:52 raid1 /kernel: vinum: raid.p0 is degraded

> That looks like it may be a vinum issue.  You shouldn't be getting buffers
> done twice, as that error message indicates.  Have you talked to Greg at
> all about this?  If you're chasing down bugs in Vinum, it would make sense
> to contact the author and work with him to either find the problem, or
> trace it to some other part of the system.
> 
> > Nov 30 18:31:52 raid1 /kernel: (da8:ahc1:0:11:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
> > Nov 30 18:33:16 raid1 /kernel: (da8:ahc1:0:11:0): lost device
> > Nov 30 18:33:16 raid1 /kernel: (da8:ahc1:0:11:0): removing device entry
> >
> > ... I got more than one of the Synchronize cache failed.  the "lost
> > device" was when I "camcontrol rescan 1"  ... I did do a "camcontrol
> > reset 1", but it didn't affect things.
> 
> All of that is normal.  The synchronize cache failed since there was no
> device there to talk to.  You probably got more than one of those because
> it was retried.
> 
> > The net result is that SCSI bus 1 was wedged after this.  I would
> > conjecture that removing a device (and running with this device
> > removed is precisely what the chassy was designed to do) should not
> > wedge things.
> 
> How do you know the bus was wedged?  Could you issue SCSI commands with
> camcontrol?  e.g.:
> 
> camcontrol tur da10 -v
> 
> Will issue a test unit ready to da10.  If it responds, the bus isn't
> wedged.
> 
> > In fact, since the camcontrol rescan 1 was successful, I suggest that
> > it was cam, not the ahc driver that was somehow wedged.
> 
> I don't think it's clear at all what wedged.  The fact that you were able
> to rescan the bus indicates that the CAM side of things is probably working
> properly.  One of the things that a rescan does is send a SCSI inquiry
> command to every possible target ID on the bus.  You can't do that if the
> bus is wedged.

Doesn't this all mean and conclude that vinum is not yet 100%, or even
70%, supportive of RAID-5, AND Hot-Swap.  I thought vinum didn't support
hot-swap.

I've been tuning into this discussion, staying relatively silent as it
wooshes above my head, but anyway, feel free to ignore my comment. :)


Ciao,

Matador
matador@techie.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message