From owner-freebsd-scsi@FreeBSD.ORG  Wed Aug  6 14:59:51 2003
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4DBED37B401
	for <freebsd-scsi@freebsd.org>; Wed,  6 Aug 2003 14:59:51 -0700 (PDT)
Received: from mail.sandvine.com (sandvine.com [199.243.201.138])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8793043FA3
	for <freebsd-scsi@freebsd.org>; Wed,  6 Aug 2003 14:59:50 -0700 (PDT)
	(envelope-from ddolson@sandvine.com)
Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19)
	id <305LHNZB>; Wed, 6 Aug 2003 17:59:50 -0400
Message-ID: <FE045D4D9F7AED4CBFF1B3B813C853370191908E@mail.sandvine.com>
From: Dave Dolson <ddolson@sandvine.com>
To: "'Justin T. Gibbs'" <gibbs@scsiguy.com>,
	"'freebsd-scsi@freebsd.org'" <freebsd-scsi@freebsd.org>
Date: Wed, 6 Aug 2003 17:59:46 -0400 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Subject: RE: Swapping deadlock due to aic/scsi errors?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Aug 2003 21:59:51 -0000

 
> > We have a reproducible bug characterized by the system
> > becoming unresponsive (but db may be entered).
> > System is based on FreeBSD 4.7 (i386)
> > Using the aic79xx scsi driver.
> 
> If you are using the stock aic79xx driver found in 4.7, I would
> start by pulling in the latest 4.X aic79xx driver into your system.

Yes, we are using the latest RELENG_4 driver.

> > I would like to add some debugging to detect the lost command 
> > and possibly retry it.  Can someone suggest where the lost
> > command is supposed to be detected, and where the retry is 
> > supposed to occur.
> 
> The "lost command" is supposed to be detected by the timeout
> handler in the ahd driver.  The timeout handler just forces
> a bus reset which should cause the command to be returned to
> the SCSI layer and then retried.  It's not clear to me why
> this might not be happening, but the ahd driver was relatively
> green in 4.7 and you may just be tripping over a known (and
> later corrected) bug manifesting itself in an unusual way.

Are you referring to the timeout handler ahd_timeout() ?
Are the commmands retried from ahd_reset_channel() ?
(It looks more like they're simply aborted.)

Aside: Am I correct in believing that ahd_execute_scb() is called 
for every command to the drive?

David Dolson (ddolson@sandvine.com, www.sandvine.com)