From owner-freebsd-stable Wed Sep 5 8:12:23 2001 Delivered-To: freebsd-stable@freebsd.org Received: from sabre.velocet.net (sabre.velocet.net [198.96.118.66]) by hub.freebsd.org (Postfix) with ESMTP id 8167937B405; Wed, 5 Sep 2001 08:12:19 -0700 (PDT) Received: from office.tor.velocet.net (trooper.velocet.net [204.138.45.2]) by sabre.velocet.net (Postfix) with ESMTP id C580313801D; Wed, 5 Sep 2001 11:12:17 -0400 (EDT) Received: (from dgilbert@localhost) by office.tor.velocet.net (8.11.4/8.9.3) id f85FCHr23078; Wed, 5 Sep 2001 11:12:17 -0400 (EDT) (envelope-from dgilbert) From: David Gilbert MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15254.16593.350305.548246@trooper.velocet.net> Date: Wed, 5 Sep 2001 11:12:17 -0400 To: Doug Hardie Cc: "Lawrence Farr" , "'Greg Lehey'" , "'Lawrence Farr'" , "'David Gilbert'" , "'Chris BeHanna'" , "'FreeBSD-Stable'" Subject: RE: [stable] Re: RAID5 In-Reply-To: References: <002c01c135e4$69c924d0$c80aa8c0@lfarr> X-Mailer: VM 6.92 under 20.4 "Emerald" XEmacs Lucid Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG >>>>> "Doug" == Doug Hardie writes: Doug> Writes to the fast-write disk are fast. Distributing the Doug> information the RAID 5 way to multiple disks is slower and hence Doug> the RAID system bottlenecked and quit accepting data from the Doug> host. It had no place to put it until the fast-write disk had Doug> been processed. While the information above is not complete Doug> enough to say for sure, it looks like thats what occurred in Doug> that test. -- -- Doug Well... FreeBSD doesn't use a 'fast write' disk (although this is an interesting idea), but writing a single block of RAID-5 data requires a read of the previous data, a read of the parity block then a write of the data and a write of the parity block --- 4 I/O operations. Obviously some buffering could help here, but there's a real problem with consistency on the disk if the parity write is not synchronous with the block write. Now, the natuaral question is: where would technology like a 'fast write' disk fit within the system we've built ourselves? Vinum could (for instance) manage a log of updated blocks on a fast write disk to be commited as it was able. It all seems suboptimal, though. Here's a left-field question/answer: We've already seen that soft updates is "way cool" (tm) technology for meta-data updates. Imagine extending soft updates to work on a RAID-5 filesystem. Now... I know this is going up-and-down the little layering that we've done for ourselves in one tool, but here's the argument: In an 8 drive RAID-5 system, for each parity block, you have 7 data blocks. This means (in the current case) if you update all 7 data blocks you will have 28 potential I/O operations ... 14 of which are unavoidable as the writes are forced to be sequential (the reads could possibly be buffered). Now imagine the softupdates case: If the RAID-5 information was managed by softupdates (or similar system), we could identify multiple updates to blocks in the same RAID group at the filesystem level and avoid as many as 6 of the 14 writes. Moreover, this might actually be managed at the buffer-cache level instead of the current system where the blocks are forced out to disk synchronously. Dave. -- ============================================================================ |David Gilbert, Velocet Communications. | Two things can only be | |Mail: dgilbert@velocet.net | equal if and only if they | |http://www.velocet.net/~dgilbert | are precisely opposite. | =========================================================GLO================ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message