From owner-svn-src-stable-8@FreeBSD.ORG  Sun Aug  7 17:28:08 2011
Return-Path: <owner-svn-src-stable-8@FreeBSD.ORG>
Delivered-To: svn-src-stable-8@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9C7AC1065670;
	Sun,  7 Aug 2011 17:28:08 +0000 (UTC) (envelope-from mav@FreeBSD.org)
Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c])
	by mx1.freebsd.org (Postfix) with ESMTP id 827C28FC08;
	Sun,  7 Aug 2011 17:28:08 +0000 (UTC)
Received: from svn.freebsd.org (localhost [127.0.0.1])
	by svn.freebsd.org (8.14.4/8.14.4) with ESMTP id p77HS8LC009647;
	Sun, 7 Aug 2011 17:28:08 GMT (envelope-from mav@svn.freebsd.org)
Received: (from mav@localhost)
	by svn.freebsd.org (8.14.4/8.14.4/Submit) id p77HS8QD009645;
	Sun, 7 Aug 2011 17:28:08 GMT (envelope-from mav@svn.freebsd.org)
Message-Id: <201108071728.p77HS8QD009645@svn.freebsd.org>
From: Alexander Motin <mav@FreeBSD.org>
Date: Sun, 7 Aug 2011 17:28:08 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-all@freebsd.org,
	svn-src-stable@freebsd.org, svn-src-stable-8@freebsd.org
X-SVN-Group: stable-8
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: 
Subject: svn commit: r224696 - stable/8/sys/cam
X-BeenThere: svn-src-stable-8@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SVN commit messages for only the 8-stable src tree
	<svn-src-stable-8.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-stable-8>, 
	<mailto:svn-src-stable-8-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-stable-8>
List-Post: <mailto:svn-src-stable-8@freebsd.org>
List-Help: <mailto:svn-src-stable-8-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-stable-8>, 
	<mailto:svn-src-stable-8-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Aug 2011 17:28:08 -0000

Author: mav
Date: Sun Aug  7 17:28:08 2011
New Revision: 224696
URL: http://svn.freebsd.org/changeset/base/224696

Log:
  MFC r224496:
  In some cases failed SATA disks may report their presence, but don't
  respond to any commands. I've found, that because of multiple command
  retries, each of which cause 30s timeout, bus reset and another retry or
  requeue for many commands, it may take ages to eventually drop the
  failed device. The odd thing is that those retries continue even after
  XPT considered device as dead and invalidated it.
  
  This patch makes cam_periph_error() to block any command retries after
  periph was marked as invalid. With that patch all activity completes in
  1-2 minutes, just after several timeouts, required to consider device
  death. This should make ZFS, gmirror, graid, etc. operation more robust.

Modified:
  stable/8/sys/cam/cam_periph.c
Directory Properties:
  stable/8/sys/   (props changed)
  stable/8/sys/amd64/include/xen/   (props changed)
  stable/8/sys/cddl/contrib/opensolaris/   (props changed)
  stable/8/sys/contrib/dev/acpica/   (props changed)
  stable/8/sys/contrib/pf/   (props changed)

Modified: stable/8/sys/cam/cam_periph.c
==============================================================================
--- stable/8/sys/cam/cam_periph.c	Sun Aug  7 17:19:59 2011	(r224695)
+++ stable/8/sys/cam/cam_periph.c	Sun Aug  7 17:28:08 2011	(r224696)
@@ -1484,7 +1484,8 @@ camperiphscsisenseerror(union ccb *ccb, 
 		 * make sure we actually have retries available.
 		 */
 		if ((err_action & SSQ_DECREMENT_COUNT) != 0) {
-		 	if (ccb->ccb_h.retry_count > 0)
+		 	if (ccb->ccb_h.retry_count > 0 &&
+			    (periph->flags & CAM_PERIPH_INVALID) == 0)
 		 		ccb->ccb_h.retry_count--;
 			else {
 				*action_string = "Retries exhausted";
@@ -1643,6 +1644,7 @@ int
 cam_periph_error(union ccb *ccb, cam_flags camflags,
 		 u_int32_t sense_flags, union ccb *save_ccb)
 {
+	struct cam_periph *periph;
 	const char *action_string;
 	cam_status  status;
 	int	    frozen;
@@ -1650,7 +1652,8 @@ cam_periph_error(union ccb *ccb, cam_fla
 	int         openings;
 	u_int32_t   relsim_flags;
 	u_int32_t   timeout = 0;
-	
+
+	periph = xpt_path_periph(ccb->ccb_h.path);
 	action_string = NULL;
 	status = ccb->ccb_h.status;
 	frozen = (status & CAM_DEV_QFRZN) != 0;
@@ -1712,9 +1715,9 @@ cam_periph_error(union ccb *ccb, cam_fla
 			xpt_print(ccb->ccb_h.path, "Data overrun\n");
 			printed++;
 		}
-		error = EIO;	/* we have to kill the command */
 		/* decrement the number of retries */
-		if (ccb->ccb_h.retry_count > 0) {
+		if (ccb->ccb_h.retry_count > 0 &&
+		    (periph->flags & CAM_PERIPH_INVALID) == 0) {
 			ccb->ccb_h.retry_count--;
 			error = ERESTART;
 		} else {
@@ -1733,7 +1736,8 @@ cam_periph_error(union ccb *ccb, cam_fla
 		struct cam_path *newpath;
 
 		if ((camflags & CAM_RETRY_SELTO) != 0) {
-			if (ccb->ccb_h.retry_count > 0) {
+			if (ccb->ccb_h.retry_count > 0 &&
+			    (periph->flags & CAM_PERIPH_INVALID) == 0) {
 
 				ccb->ccb_h.retry_count--;
 				error = ERESTART;
@@ -1751,10 +1755,11 @@ cam_periph_error(union ccb *ccb, cam_fla
 				timeout = periph_selto_delay;
 				break;
 			}
+			action_string = "Retries exhausted";
 		}
 		error = ENXIO;
 		/* Should we do more if we can't create the path?? */
-		if (xpt_create_path(&newpath, xpt_path_periph(ccb->ccb_h.path),
+		if (xpt_create_path(&newpath, periph,
 				    xpt_path_path_id(ccb->ccb_h.path),
 				    xpt_path_target_id(ccb->ccb_h.path),
 				    CAM_LUN_WILDCARD) != CAM_REQ_CMP) 
@@ -1799,11 +1804,16 @@ cam_periph_error(union ccb *ccb, cam_fla
 		/* FALLTHROUGH */
 	case CAM_REQUEUE_REQ:
 		/* Unconditional requeue */
-		error = ERESTART;
 		if (bootverbose && printed == 0) {
 			xpt_print(ccb->ccb_h.path, "Request requeued\n");
 			printed++;
 		}
+		if ((periph->flags & CAM_PERIPH_INVALID) == 0)
+			error = ERESTART;
+		else {
+			action_string = "Retries exhausted";
+			error = EIO;
+		}
 		break;
 	case CAM_RESRC_UNAVAIL:
 		/* Wait a bit for the resource shortage to abate. */
@@ -1818,7 +1828,8 @@ cam_periph_error(union ccb *ccb, cam_fla
 		/* FALLTHROUGH */
 	default:
 		/* decrement the number of retries */
-		if (ccb->ccb_h.retry_count > 0) {
+		if (ccb->ccb_h.retry_count > 0 &&
+		    (periph->flags & CAM_PERIPH_INVALID) == 0) {
 			ccb->ccb_h.retry_count--;
 			error = ERESTART;
 			if (bootverbose && printed == 0) {