Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Apr 1998 12:55:58 +0100 (BST)
From:      Simon Park <si@nemesis.demon.co.uk>
To:        linux-scsi@vger.rutgers.edu, torvalds@transmeta.com, aic7xxx@FreeBSD.ORG, dledford@dialnet.net
Subject:   2.1.98 bugs in mid level scsi abort processing + patch
Message-ID:  <Pine.LNX.3.96.980429122343.380A-100000@nemesis.demon.co.uk>

next in thread | raw e-mail | index | archive | help
Hi,

I've been having problems with my scsi system locking up since I added a
new disk about a month ago. I expect this is a cabling or termination
problem but what surprised me was that linux(*) never tried to abort the
command or tried a bus reset.

(*) Tried with vanilla 2.1.9x, 2.1.98 + pre-patch-2.1.99-1 or
    2.1.98 + pre-patch-2.1.99-1 + aic7xxx-5.0.14.
    Also tried both SMP and UP builds.

The lockup could easily be triggered by dd'ing from the new disk to
/dev/null while also dd'ing from any other scsi disk or cdrom to
/dev/null. A bog standard scsi torture test. 

After turning on scsi logging and adding a few printks into
scsi_old_times_out I discovered that the following test was always true.

    if (SCpnt->serial_number_at_timeout != SCpnt->serial_number)
    {
        don't do abort processing since the serial numbers don't match
        return;
    }

Basically, the SCpnt->serial_number_at_timeout is always zero. Looking
back at the 2.1 series patches it seems that the code to set it was
removed at 2.1.75.

After re-enabling the setting ot serial_number_at_timeout the aborts and
resets started occuring. Unfortunately they don't unjam the bus so some
more digging into the aic7xxx driver is needed. 

Here's the patch to set the serial numbers properly and to make sure that
they are zeroed if the command completes. The patch also includes a 2 line
fix so that tagged command queueing is enabled if you use the
'add-device-single a b c d' functionality. 

Can the io_request_lock experts please check the patch to make sure I
haven't introduced races.

Cheers
Si

-----------------------------cut here------------------------------
--- v2.1.98/linux/drivers/scsi/scsi.c	Tue Apr 28 09:48:33 1998
+++ linux/drivers/scsi/scsi.c	Wed Apr 29 12:00:55 1998
@@ -1456,6 +1456,7 @@
     memcpy ((void *) SCpnt->data_cmnd , (const void *) cmnd, 12);
     SCpnt->reset_chain = NULL;
     SCpnt->serial_number = 0;
+    SCpnt->serial_number_at_timeout = 0;
     SCpnt->bufflen = bufflen;
     SCpnt->buffer = buffer;
     SCpnt->flags = 0;
@@ -1525,6 +1526,10 @@
       return;
     }
 
+  /* Set the serial numbers back to zero */
+  SCpnt->serial_number = 0;
+  SCpnt->serial_number_at_timeout = 0;
+
   SCpnt->state = SCSI_STATE_BHQUEUE;
   SCpnt->owner = SCSI_OWNER_BH_HANDLER;
   SCpnt->bh_next = NULL;
@@ -2297,6 +2302,8 @@
 	    return(-ENOSYS);  /* We do not yet support unplugging */
 
 	scan_scsis (HBA_ptr, 1, channel, id, lun);
+	if (HBA_ptr->select_queue_depths != NULL)
+		(HBA_ptr->select_queue_depths)(HBA_ptr, HBA_ptr->host_queue);
 	return(length);
 
     }
--- v2.1.98/linux/drivers/scsi/scsi_error.c	Tue Apr 28 09:47:36 1998
+++ linux/drivers/scsi/scsi_error.c	Wed Apr 29 12:09:36 1998
@@ -201,6 +201,9 @@
      }
 #endif
 
+    /* Set the serial_number_at_timeout to the current serial_number */
+    SCpnt->serial_number_at_timeout = SCpnt->serial_number;
+
     SCpnt->state = SCSI_STATE_TIMEOUT;
     SCpnt->owner = SCSI_OWNER_ERROR_HANDLER;
     
--- v2.1.98/linux/drivers/scsi/scsi_obsolete.c	Tue Apr 28 09:48:33 1998
+++ linux/drivers/scsi/scsi_obsolete.c	Wed Apr 29 12:08:22 1998
@@ -146,6 +146,10 @@
     unsigned long flags;
 
     spin_lock_irqsave(&io_request_lock, flags);
+
+    /* Set the serial_number_at_timeout to the current serial_number */
+    SCpnt->serial_number_at_timeout = SCpnt->serial_number;
+
     switch (SCpnt->internal_timeout & (IN_ABORT | IN_RESET | IN_RESET2 | IN_RESET3))
     {
     case NORMAL_TIMEOUT:
@@ -321,6 +325,7 @@
     struct Scsi_Host * host = SCpnt->host;
     int result = SCpnt->result;
     SCpnt->serial_number = 0;
+    SCpnt->serial_number_at_timeout = 0;
     oldto = update_timeout(SCpnt, 0);
 
 #ifdef DEBUG_TIMEOUT
-------------------------------cut here----------------------------


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.3.96.980429122343.380A-100000>