From owner-freebsd-hackers  Thu Jul 13 05:05:19 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id FAA04132
          for hackers-outgoing; Thu, 13 Jul 1995 05:05:19 -0700
Received: from palmer.demon.co.uk (palmer.demon.co.uk [158.152.50.150])
          by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id FAA04126
          for <freebsd-hackers@freebsd.org>; Thu, 13 Jul 1995 05:05:09 -0700
Received: from localhost (localhost [127.0.0.1])
	  by palmer.demon.co.uk (8.6.11/8.6.11) with SMTP id MAA00616
	  ; Thu, 13 Jul 1995 12:50:05 +0100
X-Authentication-Warning: palmer.demon.co.uk: Host localhost didn't use HELO protocol
To: Karl Denninger <karl@mcs.net>
cc: Tom Samplonius <tom@misery.sdf.com>, rgrimes@gndrsh.aac.dev.com,
        freebsd-hackers@freebsd.org
Subject: Re: SCSI disk wedge 
In-reply-to: Your message of "Wed, 12 Jul 1995 20:43:04 CDT."
             <199507130143.UAA00551@Jupiter.mcs.net> 
Date: Thu, 13 Jul 1995 12:50:04 +0100
Message-ID: <613.805636204@palmer.demon.co.uk>
From: Gary Palmer <gary@palmer.demon.co.uk>
Sender: hackers-owner@freebsd.org
Precedence: bulk

In message <199507130143.UAA00551@Jupiter.mcs.net>, Karl Denninger writes:
>>   It could be that one of the drives has a firware bug.  This is not that 
>> uncommon.  It was reported in hackers that some Conner drives have such 
>> problems.  I also remember getting bug-fix firmware upgrades for old 
>> Micropolis drives.

>If FreeBSD is going to be a production platform then it is going to have to
>start behaving like one.  This means that pushing things off on drive
>vendors is not acceptable.

My reading of Tom's statement is a SUGGESTION of a possible reason for
the failure, not a flat dismissal of the problem as a firmware
bug. Speaking as a person who HAS a drive with a SEVERE firmware bug
(which Conner do acknowledge), they do happen, and there is not a lot
FreeBSD can do to handle these situations!

>I am not at all convinced this is a firmware issue.  If it was then the 83
>days of uptime on identically-configured BSDI machines wouldn't be happening.

You seem to be operating under the false assumption that all OS's do
their disk i/o in the same way, with the same sized transfers, and the
same IRQ response times. E.g. the Conner firmware bug doesn't exhibit
itself under DOS/Windows as they perform smaller transfers. Under
FreeBSD 2 and later versions of Linux, the drive hangs the SCSI card
as the firmware can't handle the requested transfer size. FreeBSD is
not violating the SCSI specs by requesting this transfer size, it's
just the drive microcode was poorly written/tested and falls over.

Gary
(Who's just waiting for his machine to fall over again with a hung SCSI
 bus)

P.S. Conner are willing to upgrade my drive, but I can't afford to be
     off the air for the two weeks that they say it will take.