From owner-freebsd-stable@FreeBSD.ORG Wed Apr 2 10:57:26 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 16FEA37B401 for ; Wed, 2 Apr 2003 10:57:26 -0800 (PST) Received: from ns.altadena.net (ns.altadena.net [207.151.161.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 64D1C43FA3 for ; Wed, 2 Apr 2003 10:57:25 -0800 (PST) (envelope-from pete@ns.altadena.net) Received: from ns.altadena.net (localhost [127.0.0.1]) by ns.altadena.net (8.12.6p2/8.12.3) with ESMTP id h32IvOLn034649 for ; Wed, 2 Apr 2003 10:57:24 -0800 (PST) (envelope-from pete@ns.altadena.net) Received: (from pete@localhost) by ns.altadena.net (8.12.6p2/8.12.3/Submit) id h32IvO1d034648 for stable@freebsd.org; Wed, 2 Apr 2003 10:57:24 -0800 (PST) (envelope-from pete) From: Pete Carah Message-Id: <200304021857.h32IvO1d034648@ns.altadena.net> To: stable@freebsd.org Date: Wed, 2 Apr 2003 10:57:24 -0800 (PST) X-Mailer: ELM [version 2.4ME+ PL68 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=iso8859-1 Content-Transfer-Encoding: 7bit Subject: Re: vinum performance X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Apr 2003 18:57:26 -0000 This whole thing takes me back to my old SGI days; we had an array on one machine that was meant to stream uncompressed HDTV data (this runs about 1gbit/sec in plain rgb; the sgi video adapters wanted padding to 32bits/pixel so it turns out to be around 1.2-1.4 gbits/sec). Raid 5 was not a consideration; with the controllers in question it was faster to just telecine the film again than to do a parity recovery (film is a *wonderful* storage medium!!) (plus the write-speed demands are pretty strict too, even though the telecine was on a single hippi channel so a bit slower than the playout speed. At least it was a step (drum) telecine so didn't care about missing the frame rate.) The array was 40 drives on 4 fiber-channel controllers. The stripe parameters were chosen to match the size of a video frame (about 150-160meg for color) to the size of one stripe across the whole array - there was a little padding needed to make this come out even with stripes being mults of 512 bytes... (and to get around some of Greg's other hints, you get some seek-independence and lots of other overhead help (OS DMA setup) by making the cross-controller vary fastest and in-controller slowest) This stripe scheme is *very* particular to one kind of performance optimization (BIG specific-io-size streaming); it would be terrible for usenet, for example. You could take it as one extreme with transaction-database storage probably the other (where reliability is often judged more important than raw speed, and transactions generally fit in one IO request. Also the read part of the transaction can be cached easily and thus the write only involves steps 3 and 4 of the raid-5 steps mentioned before). Remember the 3-way tradeoff mentioned earlier in this thread... And at least 2 yrs ago, none of the major raid cabinet folks made (stock) arrays that optimized this kind of streaming performance; they all aimed at database customers. This was on a cray-link challenge machine with a 2-3 gbit/sec backplane and memory, btw; drive array set up as jbod with xfs software raid. Lucky I didn't have to pay for it :-) (and you had to turn off xfs journaling and other such things that could get you without knowing quite why...) Fortunately the SGI graphics folk furnished scripts that normally got this right. We often needed to restripe the array for each transfer, and always newfs to get the sequential write properties right. -- Pete