From owner-svn-src-projects@FreeBSD.ORG  Wed Jan  5 00:19:41 2011
Return-Path: <owner-svn-src-projects@FreeBSD.ORG>
Delivered-To: svn-src-projects@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1BA23106564A;
	Wed,  5 Jan 2011 00:19:41 +0000 (UTC) (envelope-from imp@FreeBSD.org)
Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c])
	by mx1.freebsd.org (Postfix) with ESMTP id 0AA8E8FC14;
	Wed,  5 Jan 2011 00:19:41 +0000 (UTC)
Received: from svn.freebsd.org (localhost [127.0.0.1])
	by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id p050JeTB059536;
	Wed, 5 Jan 2011 00:19:40 GMT (envelope-from imp@svn.freebsd.org)
Received: (from imp@localhost)
	by svn.freebsd.org (8.14.3/8.14.3/Submit) id p050Je5J059533;
	Wed, 5 Jan 2011 00:19:40 GMT (envelope-from imp@svn.freebsd.org)
Message-Id: <201101050019.p050Je5J059533@svn.freebsd.org>
From: Warner Losh <imp@FreeBSD.org>
Date: Wed, 5 Jan 2011 00:19:40 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-projects@freebsd.org
X-SVN-Group: projects
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: 
Subject: svn commit: r216984 - projects/graid/head/sys/geom/raid
X-BeenThere: svn-src-projects@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the src &quot; projects&quot;
	tree" <svn-src-projects.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-projects>, 
	<mailto:svn-src-projects-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-projects>
List-Post: <mailto:svn-src-projects@freebsd.org>
List-Help: <mailto:svn-src-projects-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-projects>, 
	<mailto:svn-src-projects-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Jan 2011 00:19:41 -0000

Author: imp
Date: Wed Jan  5 00:19:40 2011
New Revision: 216984
URL: http://svn.freebsd.org/changeset/base/216984

Log:
  First pass at error recovery: if the first disk that we get errors on
  has a problem, try from the second one.  Note info about possible bad
  sector remap attempt through write, and some ideas on when to eject
  the subdisk from the disk.

Modified:
  projects/graid/head/sys/geom/raid/g_raid.h
  projects/graid/head/sys/geom/raid/tr_raid1.c

Modified: projects/graid/head/sys/geom/raid/g_raid.h
==============================================================================
--- projects/graid/head/sys/geom/raid/g_raid.h	Tue Jan  4 23:00:54 2011	(r216983)
+++ projects/graid/head/sys/geom/raid/g_raid.h	Wed Jan  5 00:19:40 2011	(r216984)
@@ -137,6 +137,7 @@ struct g_raid_subdisk {
 	off_t			 sd_size;	/* Size on the disk. */
 	u_int			 sd_pos;	/* Position in volume. */
 	u_int			 sd_state;	/* Subdisk state. */
+	int			 sd_read_errs;  /* Count of the read errors */
 	LIST_ENTRY(g_raid_subdisk)	 sd_next; /* Next subdisk on disk. */
 };
 

Modified: projects/graid/head/sys/geom/raid/tr_raid1.c
==============================================================================
--- projects/graid/head/sys/geom/raid/tr_raid1.c	Tue Jan  4 23:00:54 2011	(r216983)
+++ projects/graid/head/sys/geom/raid/tr_raid1.c	Wed Jan  5 00:19:40 2011	(r216984)
@@ -164,24 +164,34 @@ g_raid_tr_stop_raid1(struct g_raid_tr_ob
 	return (0);
 }
 
+/*
+ * Select the disk to do the reads to.  For now, we just pick the
+ * first one in the list that's active always.  This ensures we favor
+ * one disk on boot, and have more deterministic recovery from the
+ * weird edge cases of power failure.  In the future, we can imagine
+ * policies that go for the least loaded disk to improve performance,
+ * or we need to limit reads to a disk during some kind of error
+ * recovery with that disk.
+ */
+static struct g_raid_subdisk *
+g_raid_tr_raid1_select_read_disk(struct g_raid_volume *vol)
+{
+	int i;
+
+	for (i = 0; i < vol->v_disks_count; i++)
+		if (vol->v_subdisks[i].sd_state == G_RAID_SUBDISK_S_ACTIVE)
+			return (&vol->v_subdisks[i]);
+	return (NULL);
+}
+
 static void
 g_raid_tr_iostart_raid1_read(struct g_raid_tr_object *tr, struct bio *bp)
 {
-	struct g_raid_softc *sc;
-	struct g_raid_volume *vol;
 	struct g_raid_subdisk *sd;
 	struct bio *cbp;
-	int i;
 
-	vol = tr->tro_volume;
-	sc = vol->v_softc;
-	sd = NULL;
-	for (i = 0; i < vol->v_disks_count; i++) {
-		if (vol->v_subdisks[i].sd_state != G_RAID_SUBDISK_S_ACTIVE)
-			continue;
-		sd = &vol->v_subdisks[i];
-	}
-	KASSERT(sd != NULL, ("No active disks in volume %s.", vol->v_name));
+	sd = g_raid_tr_raid1_select_read_disk(tr->tro_volume);
+	KASSERT(sd != NULL, ("No active disks in volume %s.", tr->tro_volume->v_name));
 
 	cbp = g_clone_bio(bp);
 	if (cbp == NULL) {
@@ -283,6 +293,53 @@ g_raid_tr_iodone_raid1(struct g_raid_tr_
 
 	pbp = bp->bio_parent;
 	pbp->bio_inbed++;
+	if ((pbp->bio_flags & BIO_ERROR) && pbp->bio_cmd == BIO_READ &&
+	    pbp->bio_children == 1) {
+		struct bio *cbp;
+		struct g_raid_subdisk *nsd;
+		struct g_raid_volume *vol;
+		int i;
+
+		/*
+		 * Retry the error on the other disk drive, if available,
+		 * before erroring out the read.  Do we need to mark the
+		 * 'sd' disk as degraded somehow?
+		 */
+		vol = tr->tro_volume;
+		sd->sd_read_errs++;
+		/*
+		 * XXX Check threshold of sd_read_errs here to declare 
+		 * this subdisk bad?
+		 */
+		for (nsd = NULL, i = 0; i < vol->v_disks_count; i++) {
+			nsd = &vol->v_subdisks[i];
+			if (sd == nsd)
+				continue;
+			if (nsd->sd_state != G_RAID_SUBDISK_S_ACTIVE)
+				continue;
+			cbp = g_clone_bio(pbp);
+			if (cbp == NULL)
+				break;
+			g_raid_subdisk_iostart(nsd, cbp);
+			return;
+		}
+		/*
+		 * something happened, so we can't retry.  Return the
+		 * original error by falling through.
+		 */
+	}
+	/*
+	 * If it was a read, and bio_children is 2, then we just
+	 * recovered the data from the second drive.  We should try to
+	 * write that data to the first drive if sector remapping is
+	 * enabled.  A write should put the data in a new place on the
+	 * disk, remapping the bad sector.  Do we need to do that by
+	 * queueing a request to the main worker thread?  It doesn't
+	 * affect the return code of this current read, and can be
+	 * done at our liesure.
+	 *
+	 * XXX TODO
+	 */
 	if (pbp->bio_children == pbp->bio_inbed) {
 		pbp->bio_completed = pbp->bio_length;
 		g_raid_iodone(pbp, bp->bio_error);