From owner-freebsd-stable Wed Dec 1 12:51:34 1999 Delivered-To: freebsd-stable@freebsd.org Received: from mail.rdc2.on.home.com (ha1.rdc2.on.home.com [24.9.0.15]) by hub.freebsd.org (Postfix) with ESMTP id 462E215220 for ; Wed, 1 Dec 1999 12:51:22 -0800 (PST) (envelope-from bullfighter@home.com) Received: from home.com ([24.64.153.201]) by mail.rdc2.on.home.com (InterMail v4.01.01.07 201-229-111-110) with ESMTP id <19991201205057.KUEA26733.mail.rdc2.on.home.com@home.com>; Wed, 1 Dec 1999 12:50:57 -0800 Message-ID: <384589FD.14CCA8BB@home.com> Date: Wed, 01 Dec 1999 15:50:05 -0500 From: M a t a d o r X-Mailer: Mozilla 4.7 [en] (X11; I; FreeBSD 3.3-RELEASE i386) X-Accept-Language: en MIME-Version: 1.0 To: "Kenneth D. Merry" Cc: David Gilbert , stable@FreeBSD.ORG Subject: Re: vinum experiences. References: <199912011806.LAA43219@panzer.kdm.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > While I'm still chasing the memory corruption bug in vinum, I have a > > couple of observations. > > > > 1. Removing a device (at least, with the ahc controller) locks the bus > > even though I have a RAID hot-swap ready chassy (that properly > > isolates the bus between commands). In my test, I had a completely > > quiet SCSI bus when I removed one of the drives. I then wrote to the > > RAID array. I got: > > > > Nov 30 18:31:51 raid1 /kernel: (da8:ahc1:0:11:0): Invalidating pack > > Nov 30 18:31:51 raid1 /kernel: raid.p0.s6: fatal read I/O error > > Nov 30 18:31:51 raid1 /kernel: vinum: raid.p0.s6 is crashed by force > > Nov 30 18:31:52 raid1 /kernel: vinum: raid.p0 is degraded > That looks like it may be a vinum issue. You shouldn't be getting buffers > done twice, as that error message indicates. Have you talked to Greg at > all about this? If you're chasing down bugs in Vinum, it would make sense > to contact the author and work with him to either find the problem, or > trace it to some other part of the system. > > > Nov 30 18:31:52 raid1 /kernel: (da8:ahc1:0:11:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0 > > Nov 30 18:33:16 raid1 /kernel: (da8:ahc1:0:11:0): lost device > > Nov 30 18:33:16 raid1 /kernel: (da8:ahc1:0:11:0): removing device entry > > > > ... I got more than one of the Synchronize cache failed. the "lost > > device" was when I "camcontrol rescan 1" ... I did do a "camcontrol > > reset 1", but it didn't affect things. > > All of that is normal. The synchronize cache failed since there was no > device there to talk to. You probably got more than one of those because > it was retried. > > > The net result is that SCSI bus 1 was wedged after this. I would > > conjecture that removing a device (and running with this device > > removed is precisely what the chassy was designed to do) should not > > wedge things. > > How do you know the bus was wedged? Could you issue SCSI commands with > camcontrol? e.g.: > > camcontrol tur da10 -v > > Will issue a test unit ready to da10. If it responds, the bus isn't > wedged. > > > In fact, since the camcontrol rescan 1 was successful, I suggest that > > it was cam, not the ahc driver that was somehow wedged. > > I don't think it's clear at all what wedged. The fact that you were able > to rescan the bus indicates that the CAM side of things is probably working > properly. One of the things that a rescan does is send a SCSI inquiry > command to every possible target ID on the bus. You can't do that if the > bus is wedged. Doesn't this all mean and conclude that vinum is not yet 100%, or even 70%, supportive of RAID-5, AND Hot-Swap. I thought vinum didn't support hot-swap. I've been tuning into this discussion, staying relatively silent as it wooshes above my head, but anyway, feel free to ignore my comment. :) Ciao, Matador matador@techie.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message