From owner-svn-src-projects@FreeBSD.ORG Tue Jan 11 21:43:35 2011 Return-Path: Delivered-To: svn-src-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E310106566B; Tue, 11 Jan 2011 21:43:35 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 4D54A8FC0C; Tue, 11 Jan 2011 21:43:35 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id p0BLhZWp036739; Tue, 11 Jan 2011 21:43:35 GMT (envelope-from imp@svn.freebsd.org) Received: (from imp@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id p0BLhZEY036737; Tue, 11 Jan 2011 21:43:35 GMT (envelope-from imp@svn.freebsd.org) Message-Id: <201101112143.p0BLhZEY036737@svn.freebsd.org> From: Warner Losh Date: Tue, 11 Jan 2011 21:43:35 +0000 (UTC) To: src-committers@freebsd.org, svn-src-projects@freebsd.org X-SVN-Group: projects MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r217287 - projects/graid/head/sys/geom/raid X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jan 2011 21:43:35 -0000 Author: imp Date: Tue Jan 11 21:43:35 2011 New Revision: 217287 URL: http://svn.freebsd.org/changeset/base/217287 Log: Fix a few problems with read error recovery: o We need to check the bp we're given in *done() not pbp since that's where the error is. o Just check bio_error and forget the BIO_ERROR flag. o bump the inbed count a little later in the processing. o Start to do write-remapping, but only detect when we need to, rather than actually doing anything (yet). o minor style cleanup o improve mirror breaking/degrading notes and add one. With these changes I can survive at a 10% error rate both raw operations, as well as file system operations... Modified: projects/graid/head/sys/geom/raid/tr_raid1.c Modified: projects/graid/head/sys/geom/raid/tr_raid1.c ============================================================================== --- projects/graid/head/sys/geom/raid/tr_raid1.c Tue Jan 11 21:18:29 2011 (r217286) +++ projects/graid/head/sys/geom/raid/tr_raid1.c Tue Jan 11 21:43:35 2011 (r217287) @@ -291,21 +291,18 @@ static void g_raid_tr_iodone_raid1(struct g_raid_tr_object *tr, struct g_raid_subdisk *sd, struct bio *bp) { + struct bio *cbp; + struct g_raid_subdisk *nsd; + struct g_raid_volume *vol; struct bio *pbp; + int i; pbp = bp->bio_parent; - pbp->bio_inbed++; - if ((pbp->bio_flags & BIO_ERROR) && pbp->bio_cmd == BIO_READ && + if (bp->bio_error != 0 && bp->bio_cmd == BIO_READ && pbp->bio_children == 1) { - struct bio *cbp; - struct g_raid_subdisk *nsd; - struct g_raid_volume *vol; - int i; - /* - * Retry the error on the other disk drive, if available, - * before erroring out the read. Do we need to mark the - * 'sd' disk as degraded somehow? + * Retry the read error on the other disk drive, if + * available, before erroring out the read. */ vol = tr->tro_volume; sd->sd_read_errs++; @@ -323,25 +320,31 @@ g_raid_tr_iodone_raid1(struct g_raid_tr_ if (cbp == NULL) break; g_raid_subdisk_iostart(nsd, cbp); + pbp->bio_inbed++; return; } /* * something happened, so we can't retry. Return the * original error by falling through. + * + * XXX degrade/break the mirror? + */ + } + pbp->bio_inbed++; + if (pbp->bio_cmd == BIO_READ && pbp->bio_children == 2) { + /* + * If it was a read, and bio_children is 2, then we just + * recovered the data from the second drive. We should try to + * write that data to the first drive if sector remapping is + * enabled. A write should put the data in a new place on the + * disk, remapping the bad sector. Do we need to do that by + * queueing a request to the main worker thread? It doesn't + * affect the return code of this current read, and can be + * done at our liesure. + * + * XXX TODO */ } - /* - * If it was a read, and bio_children is 2, then we just - * recovered the data from the second drive. We should try to - * write that data to the first drive if sector remapping is - * enabled. A write should put the data in a new place on the - * disk, remapping the bad sector. Do we need to do that by - * queueing a request to the main worker thread? It doesn't - * affect the return code of this current read, and can be - * done at our liesure. - * - * XXX TODO - */ if (pbp->bio_children == pbp->bio_inbed) { pbp->bio_completed = pbp->bio_length; g_raid_iodone(pbp, bp->bio_error);