Date: Fri, 20 Jun 1997 17:46:37 +0200 From: Tor Egge <Tor.Egge@idi.ntnu.no> To: Tor.Egge@idi.ntnu.no Cc: freebsd-scsi@FreeBSD.ORG Subject: Re: scsi recovery code causes system freeze Message-ID: <199706201546.RAA11875@pat.idi.ntnu.no> In-Reply-To: Your message of "Mon, 09 Jun 1997 21:16:39 %2B0200" References: <199706091916.VAA16067@pat.idi.ntnu.no>
next in thread | previous in thread | raw e-mail | index | archive | help
[I wrote]
> I have some problems with heavy write activity on a scsi bus causing
> scsi timeouts. Sometimes the machine freezes during the error
> recovery.
Due to several (>5) freezes I ended up writing a workaround.
This workaround is not very well tested (The freezes do not occur
*that* often), but I believe it is mostly correct.
Jun 20 16:43:13 ikke /kernel: ahc1: Issued Channel A Bus Reset. 11 SCBs aborted
Jun 20 16:43:13 ikke /kernel: Clearing bus reset
Jun 20 16:43:13 ikke /kernel: sd7: Will resubmit scsi cmd
Jun 20 16:43:13 ikke /kernel: Clearing 'in-reset' flag
Jun 20 16:43:13 ikke /kernel: sd6: no longer in timeout
Jun 20 16:43:13 ikke /kernel: sd8: no longer in timeout
Jun 20 16:43:13 ikke /kernel: sd7: UNIT ATTENTION asc:29,2
Jun 20 16:43:13 ikke /kernel: , retries:3
Jun 20 16:43:13 ikke /kernel: sd6: UNIT ATTENTION asc:29,2
Jun 20 16:43:13 ikke /kernel: , retries:3
Jun 20 16:43:13 ikke /kernel: sd8: UNIT ATTENTION asc:29,2
Jun 20 16:43:14 ikke /kernel: , retries:3
Jun 20 16:43:14 ikke /kernel: sd9: UNIT ATTENTION asc:29,2
Jun 20 16:43:14 ikke /kernel: , retries:4
Jun 20 16:43:14 ikke /kernel: sd10: UNIT ATTENTION asc:29,2
Jun 20 16:43:14 ikke /kernel: , retries:4
Jun 20 16:43:14 ikke /kernel: sd7: Resubmitting scsi cmd
----------
Index: aic7xxx.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/scsi/aic7xxx.c,v
retrieving revision 1.118
diff -u -r1.118 aic7xxx.c
--- aic7xxx.c 1997/04/26 05:03:18 1.118
+++ aic7xxx.c 1997/06/20 15:12:35
@@ -2355,6 +2355,56 @@
#endif
}
+static void ahc_resubmit __P((void *data));
+static void ahc_resubmit(data)
+ void *data;
+{
+ struct scsi_xfer *xs;
+ struct scsi_link *sc_link;
+ int retval;
+ int s;
+
+ xs = (struct scsi_xfer *) data;
+ sc_link = xs->sc_link;
+
+ sc_print_addr(sc_link);
+ printf("Resubmitting scsi cmd\n");
+ s = splbio();
+ retval = (*(sc_link->adapter->scsi_cmd)) (xs);
+ splx(s);
+
+ switch (retval) {
+ case SUCCESSFULLY_QUEUED:
+ /*
+ * Finally queued properly, or a new resubmit has been scheduled
+ */
+ return;
+ case TRY_AGAIN_LATER:
+ /*
+ * Ran out of SCBs. Schedule a new retry in 1 second.
+ */
+ xs->error = XS_NOERROR;
+ xs->flags &= ~ITSDONE;
+ timeout(ahc_resubmit,(caddr_t) xs,hz);
+ return;
+ case COMPLETE:
+ /*
+ * Ran out of DMA segments. (aic7xxx.c specific)
+ */
+ xs->flags |= ITSDONE;
+ s = splbio();
+ scsi_done(xs);
+ splx(s);
+ return;
+ case HAD_ERROR:
+ default:
+ /*
+ * Should not happen (aic7xxx.c specific)
+ */
+ panic("ahc_resubmit: Unexpected return code (%d)",retval);
+ }
+}
+
/*
* start a scsi operation given the command and
* the data address, target, and lun all of which
@@ -2387,6 +2437,17 @@
&& (ahc->in_reset & CHANNEL_B_RESET) != 0)
|| (!IS_SCSIBUS_B(ahc, xs->sc_link)
&& (ahc->in_reset & CHANNEL_A_RESET) != 0)) {
+ if ((flags & SCSI_NOMASK) == 0) {
+ sc_print_addr(xs->sc_link);
+ printf("Will resubmit scsi cmd\n");
+ timeout(ahc_resubmit,(caddr_t) xs,hz);
+ return SUCCESSFULLY_QUEUED;
+ }
+ /*
+ * This is broken, since it will cause an infinite loop
+ * of retries while timeouts are blocked.
+ */
+ printf("Warning: Freeze imminent\n");
/* Ick, but I don't want it to abort this */
xs->retries++;
xs->error = XS_BUSY;
--------
- Tor Egge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199706201546.RAA11875>
