From owner-svn-src-projects@FreeBSD.ORG Wed Feb 9 12:48:13 2011 Return-Path: Delivered-To: svn-src-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 51057106564A; Wed, 9 Feb 2011 12:48:13 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 2734E8FC14; Wed, 9 Feb 2011 12:48:13 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id p19CmDX6084641; Wed, 9 Feb 2011 12:48:13 GMT (envelope-from mav@svn.freebsd.org) Received: (from mav@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id p19CmDtE084639; Wed, 9 Feb 2011 12:48:13 GMT (envelope-from mav@svn.freebsd.org) Message-Id: <201102091248.p19CmDtE084639@svn.freebsd.org> From: Alexander Motin Date: Wed, 9 Feb 2011 12:48:13 +0000 (UTC) To: src-committers@freebsd.org, svn-src-projects@freebsd.org X-SVN-Group: projects MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r218481 - projects/graid/head/sys/geom/raid X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Feb 2011 12:48:13 -0000 Author: mav Date: Wed Feb 9 12:48:12 2011 New Revision: 218481 URL: http://svn.freebsd.org/changeset/base/218481 Log: Do not abort rebuild on read errors, just log it and continue. For 2-disk array we have no more redundancy to recover any way. And if this rebuild really implements resync, then skipping damaged block is actually a right behavior, as second copy is most likely valid and can be used for reading. Aborting rebuild same time will make that copy inaccessible. Another reason to do it is that present code tries to rebuild/resync everything that possible. Aborted rebuild will be restarted and likely end with the same result, causing infinite loop. Modified: projects/graid/head/sys/geom/raid/tr_raid1.c Modified: projects/graid/head/sys/geom/raid/tr_raid1.c ============================================================================== --- projects/graid/head/sys/geom/raid/tr_raid1.c Wed Feb 9 12:03:22 2011 (r218480) +++ projects/graid/head/sys/geom/raid/tr_raid1.c Wed Feb 9 12:48:12 2011 (r218481) @@ -671,18 +671,29 @@ g_raid_tr_iodone_raid1(struct g_raid_tr_ */ if (trs->trso_type == TR_RAID1_REBUILD) { if (bp->bio_cmd == BIO_READ) { + + /* Immediately abort rebuild, if requested. */ + if (trs->trso_flags & TR_RAID1_F_ABORT) { + trs->trso_flags &= ~TR_RAID1_F_DOING_SOME; + g_raid_tr_raid1_rebuild_abort(tr); + return; + } + + /* On read error, skip and cross fingers. */ + if (bp->bio_error != 0) { + G_RAID_LOGREQ(0, bp, + "Read error during rebuild (%d), " + "possible data loss!", + bp->bio_error); + goto rebuild_round_done; + } + /* * The read operation finished, queue the * write and get out. */ G_RAID_LOGREQ(4, bp, "rebuild read done. %d", bp->bio_error); - if (bp->bio_error != 0 || - trs->trso_flags & TR_RAID1_F_ABORT) { - trs->trso_flags &= ~TR_RAID1_F_DOING_SOME; - g_raid_tr_raid1_rebuild_abort(tr); - return; - } bp->bio_cmd = BIO_WRITE; bp->bio_cflags = G_RAID_BIO_FLAG_SYNC; bp->bio_offset = bp->bio_offset; @@ -712,6 +723,8 @@ g_raid_tr_iodone_raid1(struct g_raid_tr_ return; } /* XXX A lot of the following is needed when we kick of the work -- refactor */ +rebuild_round_done: + nsd = trs->trso_failed_sd; trs->trso_flags &= ~TR_RAID1_F_LOCKED; g_raid_unlock_range(sd->sd_volume, bp->bio_offset, bp->bio_length);