From owner-freebsd-stable@FreeBSD.ORG Fri Aug 17 22:58:33 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C8C4216A417 for ; Fri, 17 Aug 2007 22:58:33 +0000 (UTC) (envelope-from clay@milos.co.za) Received: from bart.milos.co.za (bart.milos.co.za [196.38.18.66]) by mx1.freebsd.org (Postfix) with ESMTP id 2C71013C48A for ; Fri, 17 Aug 2007 22:58:31 +0000 (UTC) (envelope-from clay@milos.co.za) Received: (qmail 71722 invoked by uid 89); 17 Aug 2007 23:00:18 -0000 Received: by simscan 1.2.0 ppid: 71717, pid: 71719, t: 0.6255s scanners: attach: 1.2.0 clamav: 0.88.7/m:43/d:3604 Received: from unknown (HELO claylaptop) (clay@milos.za.net@84.203.45.79) by bart.milos.co.za with ESMTPA; 17 Aug 2007 23:00:18 -0000 Message-ID: <000f01c7e122$1d91a390$0301a8c0@claylaptop> From: "Clayton Milos" To: "Vivek Khera" References: <31BB09D7-B58A-47AC-8DD1-6BB8141170D8@khera.org> Date: Fri, 17 Aug 2007 23:58:21 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3138 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Cc: FreeBSD Stable Subject: Re: large RAID volume partition strategy X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Aug 2007 22:58:33 -0000 ----- Original Message ----- From: "Claus Guttesen" To: "Vivek Khera" Cc: "FreeBSD Stable" Sent: Friday, August 17, 2007 11:10 PM Subject: Re: large RAID volume partition strategy >> I have a shiny new big RAID array. 16x500GB SATA 300+NCQ drives >> connected to the host via 4Gb fibre channel. This gives me 6.5Tb of >> raw disk. >> >> I've come up with three possibilities on organizing this disk. My >> needs are really for a single 1Tb file system on which I will run >> postgres. However, in the future I'm not sure what I'll really need. >> I don't plan to ever connect any other servers to this RAID unit. >> >> The three choices I've come with so far are: >> >> 1) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare >> configuration), and make one FreeBSD file system on the whole partition. >> >> 2) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare >> configuration), and make 6 FreeBSD partitions with one file system each. >> >> 3) Make 6 RAID volumes and expose them to FreeBSD as multiple drives, >> then make one partition + file system on each "disk". Each RAID >> volume would span across all 16 drives, and I could make the volumes >> of differing RAID levels, if needed, but I'd probably stick with RAID6 >> +spare. >> >> I'm not keen on option 1 because of the potentially long fsck times >> after a crash. > > If you want to avoid the long fsck-times your remaining options are a > journaling filesystem or zfs, either requires an upgrade from freebsd > 6.2. I have used zfs and had a serverstop due to powerutage in out > area. Our zfs-samba-server came up fine with no data corruption. So I > will suggest freebsd 7.0 with zfs. > > Short fsck-times and ufs2 don't do well together. I know there is > background-fsck but for me that is not an option. > > -- > regards > Claus > > When lenity and cruelty play for a kingdom, > the gentlest gamester is the soonest winner. > > Shakespeare If you goal is speed and obviously as little possibility of a fail (RAID6+spare) then RAID6 is the wrong way to go... RAID6's read speeds are great but the write speeds are not. If you want awesome performance and reliability the real way to go is RAID10 (or more correctly RAID 0+1). You will of course lose a lot more space than you will with RAID6 but the write speeds will be astronomically higher. How would you feel with 16 drives in RAID10 with 2 hot spares? This will give you 3.5TB and if you're using a good RAID controller you should be getting write speeds of around 400MB/s to the array. I've got an Areca 1120 RAID controller with 4 320G drives in a stripe set and I'm writing at 280MB/s to that. With 7 500G drives you should be getting around 400MB/s because hte RAID10 doesn't have to calculate reconstrust data. The theoretical max you're ever going to get from the array is 500MB/s anyways with a 4Gb fibre channel controller. What it really boils down to is how much space are you willing to sacrifice for performance... Another thing you really have to do is make sure you have a good backup system. I've seen more than one customer crying because their RAID system with hot spares went on the blink and they lost their data. -Clay