From owner-freebsd-hackers Thu Jul 13 08:35:37 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id IAA13068 for hackers-outgoing; Thu, 13 Jul 1995 08:35:37 -0700 Received: from Root.COM (implode.Root.COM [198.145.90.1]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id IAA13062 for ; Thu, 13 Jul 1995 08:35:36 -0700 Received: from corbin.Root.COM (corbin [198.145.90.18]) by Root.COM (8.6.11/8.6.5) with ESMTP id IAA10619; Thu, 13 Jul 1995 08:35:06 -0700 Received: from localhost (localhost [127.0.0.1]) by corbin.Root.COM (8.6.11/8.6.5) with SMTP id IAA05001; Thu, 13 Jul 1995 08:36:07 -0700 Message-Id: <199507131536.IAA05001@corbin.Root.COM> To: Karl Denninger cc: freebsd-hackers@freebsd.org Subject: Re: SCSI disk wedge In-reply-to: Your message of "Thu, 13 Jul 95 09:56:52 CDT." <199507131456.JAA01679@Jupiter.mcs.net> From: David Greenman Reply-To: davidg@Root.COM Date: Thu, 13 Jul 1995 08:36:06 -0700 Sender: hackers-owner@freebsd.org Precedence: bulk >Since there are no errors presented to us when the 1742 hangs, and the >2742 starts complaining about timeouts, I don't know where to go next. >Tagged queueing is not at issue; I have tried with it both enabled and >disabled. With it *ON* the incidence of the hangs is reduced, but not >eliminated. > >It LOOKS like something has requested an action on the SCSI bus which is >causing problems (ie: disconnect sequencing, etc) for devices, and/or the >adapter itself, causing a wedge condition. Why this is not detectable and >correctable (or at least abortable with a panic) in the driver is unknown >to me. The kernel IS running -- I can telnet to the machine affected and >get connected, but any disk I/O attempt goes nowhere. > >There is a difference -- the 2742 is MORE stable than the 1742. The >1742 machines run about 8 hours before dying -- the 2742 with MUCH >heavier load on it can, in some cases, run for 2-3 days. I suspect that BSDI runs longer because it detects the wedge and can successfully unjam the bus. FreeBSD is known broken in this regard, and several people are working on fixing it. -DG