Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Jul 1995 10:15:24 -0500 (CDT)
From:      Karl Denninger <karl@Mcs.Net>
To:        tom@misery.sdf.com (Tom Samplonius)
Cc:        karl@Mcs.Net, rgrimes@gndrsh.aac.dev.com, freebsd-hackers@FreeBSD.ORG
Subject:   Re: SCSI disk wedge
Message-ID:  <199507131515.KAA01784@Jupiter.mcs.net>
In-Reply-To: <Pine.BSF.3.91.950712191425.1872B-100000@misery.sdf.com> from "Tom Samplonius" at Jul 12, 95 07:28:52 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> 
> 
> On Wed, 12 Jul 1995, Karl Denninger wrote:
> 
> > If FreeBSD is going to be a production platform then it is going to have to
> > start behaving like one.  This means that pushing things off on drive
> > vendors is not acceptable.
> 
>   Ok, I just needed to be convinced :) (I fondly remember the story about 
> someone complaing that FreeBSD wouldn't work on their system but Linux 
> did, so he went and re-installed Linux and would it didn't work either!)  
> The 1742 driver has been around for a long time and is very similar to 
> the driver in NetBSD.  However, the 2742/2842/2942 driver is quite 
> recent.  It is _very_ odd that you have problems with both adapters.
> 
>   Chances are slim to none that this can be fixed if someone on the core 
> team can not replicate this problem.  And so far, it appears that no one has.
> 
>   Since a system that locks up once a day can't be that useful, would it 
> be possible for you to remove some of drives for a couple of days and see 
> if that has any affect?
> 
> Tom
> 

There are THREE machines involved in our testing:

1)	A new system which has a 2742 and four Micropolis 2G disks.  This
	one has tagged queueing enabled right now, and runs for anywhere
	from an hour to four days before locking up.  When it locks, it is
	with a message about timeouts in the SCSI driver.  This machine
	*cannot* be tested with BSDI, as the 2742 is not supported.

2)	A second system, new, with the standard BSDI configuration we run
	here -- 1742A/Seagate Hawk 1G disk, 64MB RAM, ASUS P90 EISA/PCI
	motherboard.  This one freezes within 24 hours with no messages of
	any kind.  Its definitely the SCSI system, however, as the kernel IS
	running (I can ping it, telnet to it -- no login prompt, obviously
	-- hit CPU-only things that already have connected sockets, etc.)

3)	A THIRD system, which USED to run BSDI 1.x and 2.x for more than
	6 months, identical to system #2 above in configuration.  Same
	response as #2 as well.

Note that #2 and #3 only HAVE one disk attached, so removing a "few" of them
won't be very useful.  #1 has four disks on it, but before we added the
other three, it showed the same behavior with only one drive.  The disk
which goes offline first on #1 is random (no pattern detectable).

Note that the hang happens under ALL load conditions.  I have had it happen
when reading news over NFS (which has nearly no local disk activity), when
sitting at the shell prompt, when pounding the hell out of the drives, etc.

It *looks* like something is trashing the adapter's idea of the world and it
is wedging tight in response.  That's a guess, as I don't have a bus probe,
but note that the hangs happen with the SCSI bus activity light *OFF*.  Most
of the Adaptec problems I've seen with termination and the like wedge with
the light *ON*.

I can surmise that the following aren't at issue:

1)	The SCSI bus itself.  System #3 was in production for almost a year 
	with BSDI 2.x and 1.x before it was reloaded, and it has NEVER had 
	disk related problems of any kind.

2)	The disks.  Ditto, as BSDI 2.x does do scatter-gather and clustered
	I/O, which pounds the heck out of the disk subsystem.  No problems.

3)	The adapter.  Again, system #3 was in production for an extended
	period without ANY trouble.

4)	The CPU, RAM, or other adapters in the system.  See above.

Now, we beat the hell out of both our hardware and software here, in ways
which few, if any, other firms and locations do.  For that reason we
frequently find problems in both hardware and software that others miss.

--
--
Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity
Modem: [+1 312 248-0900]     | (shell, PPP, SLIP, leased) in Chicagoland
Voice: [+1 312 248-8649]     | 7 Chicagoland POPs, ISDN, 28.8, much more
Fax: [+1 312 248-9865]       | Email to "info@mcs.net" WWW: http://www.mcs.net
ISDN - Get it here TODAY!    | Home of Chicago's only FULL AP Clarinet feed!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199507131515.KAA01784>