Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 May 2016 15:32:48 +0000 (UTC)
From:      Alan Somers <asomers@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r299121 - in head/sys/dev: mpr mps
Message-ID:  <201605051532.u45FWmSt067438@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: asomers
Date: Thu May  5 15:32:47 2016
New Revision: 299121
URL: https://svnweb.freebsd.org/changeset/base/299121

Log:
  mpr(4) and mps(4) shouldn't indefinitely retry for "terminated ioc" errors
  
  Submitted by:	ken
  Reviewed by:	slm
  MFC after:	4 weeks
  Sponsored by:	Spectra Logic Corp
  Differential Revision:	https://reviews.freebsd.org/D6210

Modified:
  head/sys/dev/mpr/mpr_sas.c
  head/sys/dev/mps/mps_sas.c

Modified: head/sys/dev/mpr/mpr_sas.c
==============================================================================
--- head/sys/dev/mpr/mpr_sas.c	Thu May  5 15:21:33 2016	(r299120)
+++ head/sys/dev/mpr/mpr_sas.c	Thu May  5 15:32:47 2016	(r299121)
@@ -2465,11 +2465,20 @@ mprsas_scsiio_complete(struct mpr_softc 
 	case MPI2_IOCSTATUS_SCSI_IOC_TERMINATED:
 	case MPI2_IOCSTATUS_SCSI_EXT_TERMINATED:
 		/*
-		 * Since these are generally external (i.e. hopefully
-		 * transient transport-related) errors, retry these without
-		 * decrementing the retry count.
+		 * These can sometimes be transient transport-related
+		 * errors, and sometimes persistent drive-related errors.
+		 * We used to retry these without decrementing the retry
+		 * count by returning CAM_REQUEUE_REQ.  Unfortunately, if
+		 * we hit a persistent drive problem that returns one of
+		 * these error codes, we would retry indefinitely.  So,
+		 * return CAM_REQ_CMP_ERROR so that we decrement the retry
+		 * count and avoid infinite retries.  We're taking the
+		 * potential risk of flagging false failures in the event
+		 * of a topology-related error (e.g. a SAS expander problem
+		 * causes a command addressed to a drive to fail), but
+		 * avoiding getting into an infinite retry loop.
 		 */
-		mprsas_set_ccbstatus(ccb, CAM_REQUEUE_REQ);
+		mprsas_set_ccbstatus(ccb, CAM_REQ_CMP_ERR);
 		mprsas_log_command(cm, MPR_INFO,
 		    "terminated ioc %x scsi %x state %x xfer %u\n",
 		    le16toh(rep->IOCStatus), rep->SCSIStatus, rep->SCSIState,

Modified: head/sys/dev/mps/mps_sas.c
==============================================================================
--- head/sys/dev/mps/mps_sas.c	Thu May  5 15:21:33 2016	(r299120)
+++ head/sys/dev/mps/mps_sas.c	Thu May  5 15:32:47 2016	(r299121)
@@ -2408,11 +2408,20 @@ mpssas_scsiio_complete(struct mps_softc 
 	case MPI2_IOCSTATUS_SCSI_IOC_TERMINATED:
 	case MPI2_IOCSTATUS_SCSI_EXT_TERMINATED:
 		/*
-		 * Since these are generally external (i.e. hopefully
-		 * transient transport-related) errors, retry these without
-		 * decrementing the retry count.
+		 * These can sometimes be transient transport-related
+		 * errors, and sometimes persistent drive-related errors.
+		 * We used to retry these without decrementing the retry
+		 * count by returning CAM_REQUEUE_REQ.  Unfortunately, if
+		 * we hit a persistent drive problem that returns one of
+		 * these error codes, we would retry indefinitely.  So,
+		 * return CAM_REQ_CMP_ERROR so that we decrement the retry
+		 * count and avoid infinite retries.  We're taking the
+		 * potential risk of flagging false failures in the event
+		 * of a topology-related error (e.g. a SAS expander problem
+		 * causes a command addressed to a drive to fail), but
+		 * avoiding getting into an infinite retry loop.
 		 */
-		mpssas_set_ccbstatus(ccb, CAM_REQUEUE_REQ);
+		mpssas_set_ccbstatus(ccb, CAM_REQ_CMP_ERR);
 		mpssas_log_command(cm, MPS_INFO,
 		    "terminated ioc %x scsi %x state %x xfer %u\n",
 		    le16toh(rep->IOCStatus), rep->SCSIStatus, rep->SCSIState,



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201605051532.u45FWmSt067438>