Date: Wed, 01 Dec 1999 15:50:05 -0500 From: M a t a d o r <bullfighter@home.com> To: "Kenneth D. Merry" <ken@kdm.org> Cc: David Gilbert <dgilbert@velocet.ca>, stable@FreeBSD.ORG Subject: Re: vinum experiences. Message-ID: <384589FD.14CCA8BB@home.com> References: <199912011806.LAA43219@panzer.kdm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> > While I'm still chasing the memory corruption bug in vinum, I have a > > couple of observations. > > > > 1. Removing a device (at least, with the ahc controller) locks the bus > > even though I have a RAID hot-swap ready chassy (that properly > > isolates the bus between commands). In my test, I had a completely > > quiet SCSI bus when I removed one of the drives. I then wrote to the > > RAID array. I got: > > > > Nov 30 18:31:51 raid1 /kernel: (da8:ahc1:0:11:0): Invalidating pack > > Nov 30 18:31:51 raid1 /kernel: raid.p0.s6: fatal read I/O error > > Nov 30 18:31:51 raid1 /kernel: vinum: raid.p0.s6 is crashed by force > > Nov 30 18:31:52 raid1 /kernel: vinum: raid.p0 is degraded > That looks like it may be a vinum issue. You shouldn't be getting buffers > done twice, as that error message indicates. Have you talked to Greg at > all about this? If you're chasing down bugs in Vinum, it would make sense > to contact the author and work with him to either find the problem, or > trace it to some other part of the system. > > > Nov 30 18:31:52 raid1 /kernel: (da8:ahc1:0:11:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0 > > Nov 30 18:33:16 raid1 /kernel: (da8:ahc1:0:11:0): lost device > > Nov 30 18:33:16 raid1 /kernel: (da8:ahc1:0:11:0): removing device entry > > > > ... I got more than one of the Synchronize cache failed. the "lost > > device" was when I "camcontrol rescan 1" ... I did do a "camcontrol > > reset 1", but it didn't affect things. > > All of that is normal. The synchronize cache failed since there was no > device there to talk to. You probably got more than one of those because > it was retried. > > > The net result is that SCSI bus 1 was wedged after this. I would > > conjecture that removing a device (and running with this device > > removed is precisely what the chassy was designed to do) should not > > wedge things. > > How do you know the bus was wedged? Could you issue SCSI commands with > camcontrol? e.g.: > > camcontrol tur da10 -v > > Will issue a test unit ready to da10. If it responds, the bus isn't > wedged. > > > In fact, since the camcontrol rescan 1 was successful, I suggest that > > it was cam, not the ahc driver that was somehow wedged. > > I don't think it's clear at all what wedged. The fact that you were able > to rescan the bus indicates that the CAM side of things is probably working > properly. One of the things that a rescan does is send a SCSI inquiry > command to every possible target ID on the bus. You can't do that if the > bus is wedged. Doesn't this all mean and conclude that vinum is not yet 100%, or even 70%, supportive of RAID-5, AND Hot-Swap. I thought vinum didn't support hot-swap. I've been tuning into this discussion, staying relatively silent as it wooshes above my head, but anyway, feel free to ignore my comment. :) Ciao, Matador matador@techie.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?384589FD.14CCA8BB>