Date: Thu, 15 Apr 1999 09:00:35 +0200 From: J Wunsch <j@uriah.heep.sax.de> To: scsi@FreeBSD.ORG Subject: Re: timed out while idle? Message-ID: <19990415090035.03868@uriah.heep.sax.de> In-Reply-To: <199904142037.OAA13777@narnia.plutotech.com>; from Justin T. Gibbs on Wed, Apr 14, 1999 at 02:37:53PM -0600 References: <199904140231.UAA07250@panzer.plutotech.com> <199904142037.OAA13777@narnia.plutotech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
As Justin T. Gibbs wrote: > That's not entirely true. The device will come back if it transitions > through final close (e.g you umount -f all filesystems referencing > it). Last time i checked, this didn't work. You couldn't umount -f since umount needed (one way or the other, i didn't investigate) a drive that at least didn't respond with ENXIO all the time, and since the umount never completed, CAM was unable to ever get the drive back again. > Further, the code that usually causes the disk pack to be > invalidated is in cam_periph.c:cam_periph_error() where a selection > timeout causes us to receive an ENXIO error. I believe that > invalidating the pack is the correct thing to do since we have no > way of determining if the media or device are the same, but that we > should be retrying things like selection timeouts in a more sane > fashion so that invalidations are a rarity. I think we've been at this discussion before. IMHO, CAMs action in this case is not what all the people would expect, and it makes CAM (which i believe is excellent by design -- no criticism) rather fragile compared to other operating system. You can't e.g. swap a SCSI chain terminator while the chain is under heavy load, or it would invalidate all the disks on it. Compare this to e.g. a Solaris machine, where you can do this. Don't get me wrong, i understand why you implemented it this way (at least i believe i understand, since i guess that's the behaviour you needed for Plutotech), and i agree that this is one possible view at the world. However, i'd like to see it `tunable' in a way where it tries a lot harder to assume the drive is still alive, since from my experience, 99 % of the SCSI problems are not drives gone south, but SCSI busses being temporarily broken, which is fixable. I'd even argue that this is what most people would expect... Any chance to have the behaviour optional? (The current default behaviour might be very feasible for people running large disk farms, where the wiring is usually well, but it's indeed the disks that wear out.) > Its not CAM behavior, its da behavior. It would be a da(4) ioctl. > If you'd like to add such and ioctl and a utility to toggle it, be > my guest. OK, i'll look into it. ;-) -- cheers, J"org joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990415090035.03868>