From owner-freebsd-hackers Thu Jul 13 07:08:37 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id HAA08120 for hackers-outgoing; Thu, 13 Jul 1995 07:08:37 -0700 Received: from kitten.mcs.com (Kitten.mcs.com [192.160.127.90]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id HAA08112 for ; Thu, 13 Jul 1995 07:08:32 -0700 Received: from Jupiter.mcs.net (Jupiter.mcs.net [192.160.127.89]) by kitten.mcs.com (8.6.10/8.6.9) with ESMTP id JAA03292; Thu, 13 Jul 1995 09:08:25 -0500 Received: (from karl@localhost) by Jupiter.mcs.net (8.6.11/8.6.9) id JAA01530; Thu, 13 Jul 1995 09:08:23 -0500 From: Karl Denninger Message-Id: <199507131408.JAA01530@Jupiter.mcs.net> Subject: Re: SCSI disk wedge To: gary@palmer.demon.co.uk (Gary Palmer) Date: Thu, 13 Jul 1995 09:08:23 -0500 (CDT) Cc: karl@Mcs.Net, tom@misery.sdf.com, rgrimes@gndrsh.aac.dev.com, freebsd-hackers@freebsd.org In-Reply-To: <613.805636204@palmer.demon.co.uk> from "Gary Palmer" at Jul 13, 95 12:50:04 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Content-Length: 2498 Sender: hackers-owner@freebsd.org Precedence: bulk > > In message <199507130143.UAA00551@Jupiter.mcs.net>, Karl Denninger writes: > >> It could be that one of the drives has a firware bug. This is not that > >> uncommon. It was reported in hackers that some Conner drives have such > >> problems. I also remember getting bug-fix firmware upgrades for old > >> Micropolis drives. > > >If FreeBSD is going to be a production platform then it is going to have to > >start behaving like one. This means that pushing things off on drive > >vendors is not acceptable. > > My reading of Tom's statement is a SUGGESTION of a possible reason for > the failure, not a flat dismissal of the problem as a firmware > bug. Speaking as a person who HAS a drive with a SEVERE firmware bug > (which Conner do acknowledge), they do happen, and there is not a lot > FreeBSD can do to handle these situations! > > >I am not at all convinced this is a firmware issue. If it was then the 83 > >days of uptime on identically-configured BSDI machines wouldn't be happening. > > You seem to be operating under the false assumption that all OS's do > their disk i/o in the same way, with the same sized transfers, and the > same IRQ response times. E.g. the Conner firmware bug doesn't exhibit > itself under DOS/Windows as they perform smaller transfers. Under > FreeBSD 2 and later versions of Linux, the drive hangs the SCSI card > as the firmware can't handle the requested transfer size. FreeBSD is > not violating the SCSI specs by requesting this transfer size, it's > just the drive microcode was poorly written/tested and falls over. > > Gary > (Who's just waiting for his machine to fall over again with a hung SCSI > bus) I understand this, but the problem has been manifest on two different configurations with different vendors of hardware, disks, and two different controllers. Do you really mean to try to tell me that two vendors have *identical* firmware problems? I rate the probability of that somewhere close to a comet hitting the earth today, especially when 2.0BSDI (which uses large transfer sizes and contiguous operations) has no such problem. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity Modem: [+1 312 248-0900] | (shell, PPP, SLIP, leased) in Chicagoland Voice: [+1 312 248-8649] | 7 Chicagoland POPs, ISDN, 28.8, much more Fax: [+1 312 248-9865] | Email to "info@mcs.net" WWW: http://www.mcs.net ISDN - Get it here TODAY! | Home of Chicago's only FULL AP Clarinet feed!