From owner-p4-projects@FreeBSD.ORG Fri Jul 20 12:35:27 2007 Return-Path: Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id 42D8816A41B; Fri, 20 Jul 2007 12:35:27 +0000 (UTC) Delivered-To: perforce@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 147C916A41A for ; Fri, 20 Jul 2007 12:35:27 +0000 (UTC) (envelope-from lulf@FreeBSD.org) Received: from merke.itea.ntnu.no (merke.itea.ntnu.no [129.241.7.61]) by mx1.freebsd.org (Postfix) with ESMTP id 96ABD13C48E for ; Fri, 20 Jul 2007 12:35:26 +0000 (UTC) (envelope-from lulf@FreeBSD.org) Received: from localhost (localhost [127.0.0.1]) by merke.itea.ntnu.no (Postfix) with ESMTP id 3F27913D465; Fri, 20 Jul 2007 14:35:25 +0200 (CEST) Received: from twoflower.idi.ntnu.no (twoflower.idi.ntnu.no [129.241.104.169]) by merke.itea.ntnu.no (Postfix) with ESMTP; Fri, 20 Jul 2007 14:35:24 +0200 (CEST) Received: by twoflower.idi.ntnu.no (Postfix, from userid 1002) id 42D7617011; Fri, 20 Jul 2007 14:35:24 +0200 (CEST) Date: Fri, 20 Jul 2007 14:35:24 +0200 From: Ulf Lilleengen To: Eric Anderson Message-ID: <20070720123524.GA71360@twoflower.idi.ntnu.no> References: <200707172109.l6HL9PMJ078780@repoman.freebsd.org> <46A03390.3030602@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <46A03390.3030602@freebsd.org> User-Agent: Mutt/1.5.15 (2007-04-06) X-Content-Scanned: with sophos and spamassassin at mailgw.ntnu.no. Cc: perforce@freebsd.org Subject: Re: PERFORCE change 123662 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jul 2007 12:35:27 -0000 On tor, jul 19, 2007 at 11:01:20pm -0500, Eric Anderson wrote: > On 07/17/07 16:09, Ulf Lilleengen wrote: > > http://perforce.freebsd.org/chv.cgi?CH=123662 > > Change 123662 by lulf@lulf_carrot on 2007/07/17 21:08:43 > > - Initial implementation of growing RAID-5 arrays. This is done by > > splitting the offset calculation into one for read and one for write > > operations. We make a distinction of subdisks that were added after > > the plex is not newborn any longer and subdisks that were added at > > creation/tasting time. When a BIO write comes, the write will go to > > the whole plex, but read operations will only be done on subdisks that > > do not have the GV_SD_GROW flag set. The bad thing with this is that > > we must ensure that new subdisks are added to a later plexoffset > > (which we should force, to make it easier for us, since there is not a > > good reason why the user should be able to set the plexoffset in this > > operation). The implementation will probably change a bit. > > - Add another state called RESIZING, and a flag called GV_PLEX_GROWING > > to indicate that a plex is in growing operation. > > - Make sure obvious parts of the code respects this flag. Will need to > > look over this more though. > > > > Hi - > > So far, I'm very excited about your gvinum work - great work so far! > > I'm curious how you are growing a RAID5. Can you describe this method a bit > more? Where did you see how to do this? > > Hi, Well, what I do is to attach/create the new subdisk as usual, but since it's a RAID-5 array that I know is operational, I give the subdisk a flag, and sets the plex in a resize state. Then, In the raid-5 code, I modify gv_raid5_offset (which basically computes offsets within a subdisk based on the number of subdisks and stripesize). However, what I do, is that instead of taking all subdisks in the calculation, I only take those who does not have the GROW flag (when reading), and I take all subdisks into calculation when it's a write. This means, that if a create a gv_grow_plex function that reads (stripesize x sdcount) bytes (from the subdisks that do not have the GROW flag), and writes that data to the plex (including all subdisks). This way, i sort of overwrite the old data, but the data is spread out over the new subdisks. I'm sorry if this might seem a bit complex, but just ask more questions if you didn't understand. Actually, I didn't read this anywhere.. I sort of thought this out myself :P -- Ulf Lilleengen