Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Jul 2015 10:32:45 -0400
From:      Paul Kraus <paul@kraus-haus.org>
To:        FreeBSD - <freebsd-questions@freebsd.org>
Subject:   Re: Gmirror/graid or hardware raid?
Message-ID:  <7F08761C-556E-4147-95DB-E84B4E5179A5@kraus-haus.org>
In-Reply-To: <917A821C-02F8-4F96-88DA-071E3431C335@mac.com>
References:  <CA%2ByoEx-T5V3Rchxugke3%2BoUno6SwXHW1%2Bx466kWtb8VNYb%2BBbg@mail.gmail.com> <917A821C-02F8-4F96-88DA-071E3431C335@mac.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 8, 2015, at 17:21, Charles Swiger <cswiger@mac.com> wrote:

> On Jul 8, 2015, at 12:49 PM, Mario Lobo <lobo@bsd.com.br> wrote:

<snip>

> Most of the PROD databases I know of working from local storage have =
heaps of
> RAID-1 mirrors, and sometimes larger volumes created as RAID-10 or =
RAID-50.
> Higher volume shops use dedicated SAN filers via redundant Fibre =
Channel mesh
> or similar for their database storage needs.

Many years ago I had a client buy a couple racks FULL of trays of 36 GB =
SCSI drives (yes, it was that long ago) and partition them so that they =
only used the first 1 GB of each. This was purely for performance. They =
were running a relatively large Oracle database and lots of OLTP =
transactions.

>=20
>> I thought about zfs but I won't have lots of RAM avaliable
>=20
> ZFS wants to be run against bare metal.  I've never seen anyone setup =
ZFS within
> a VM; it consumes far too much memory and it really wants to talk =
directly to the
> hardware for accurate error detection.

ZFS runs fine in a VM and the notion that it _needs_ lots of RAM is =
mostly false. I have run a FBSD Guest with ZFS and only 1 GB RAM.

But=85 ZFS is designed first and foremost for data reliability and not =
performance. It gets it=92s performance from striping across many vdevs =
(the ZFS term for the top level device you assemble zpools out of), the =
ARC (adaptive reuse cache), and Logging devices. Striping requires many =
drives. The ARC uses any available RAM as a very aggressive FS cache. =
The Log device improves sync writes by committing them to a dedicated =
log device (usually a mirror of fast SSDs).

I generally use ZFS for the Host (and because of my familiarity with =
ZFS, I tend to use ZFS for all of the Host filesystems). Then I use UFS =
for the Guests _unless_ I might need to migrate data in or out of a VM =
or I need flexibility in partitioning (once you build a zpool, all zfs =
datasets in it can grab as much or as little space as they need). I can =
use zfs send / recv (even incrementally) to move data around quickly and =
easily. I generally turn on compression for VM datasets (I set up one =
zfs dataset per VM) as the CPU cost is noise and it dazes a bunch of =
space (and reduces physical disk I/O which also improves performance). I =
do NOT turn on compression in any ZFS inside a Guest as I am already =
compressing at the Host layer.

I also have a script that grabs a snapshot of every ZFS dataset every =
hour and replicates them over to my backup server. Since ZFS snapshots =
have no performance penalty, the only cost to keep them around is the =
space used. This has proven to be a lifesaver when a Guest is corrupted, =
I can easily and quickly roll it back to the most recent clean version.

>=20
>> Should I use the controller raid? Gmirror/Graid? What raid level?
>=20
> Level is easy: a 4-disk machine is suited for either a pair of =
RAID-1s, a 4-disk RAID-10 volume,
> or a 4-disk RAID-5 volume.

For ZFS, the number of vdev=92s and the type will determine performance. =
For a vdev of the following type you can expect the listed performance. =
I am listing performance in terms of comparison to a single disk.

N-way mirror: write 1x, read 1*n
RaidZ: write 1x, read 1x minimum but variable

Note that the performance of a RaidZ vdev does NOT scale with the number =
of drives in the RAID set nor does it change with the Raid level (Z1, =
Z2, Z3).

So for example, a zpool consisting of 4 vdevs each a 2-way mirror will =
have 4x the write performance of a single drive and 8x the read =
performance. A zpool consisting of 2 vdevs each a RaidZ2 of 4 drives =
will have the 2x the write performance of single drive and the read =
performance will be a minimum of 2 x the performance of a single drive. =
The variable read performance of RaidZ is because RaidZ does not always =
write full strips across all the drives in the vdev. In other words, =
RaidZ is a variable width Raid system. This has advantages and =
disadvantages :-) Here is a good blog post that describes the RaidZ =
stripe width http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/=20=


I do NOT use RaidZ for anything except bulk backup data where capacity =
is all that matters and performance is limited by lots of other factors.

I also create a =93do-not-remove=94 dataset in every zpool with 1 GB =
reserved and quota. ZFS behaves very, very badly when FULL. This give me =
a cushion when things go badly so I can delete whatever used up all the =
space =85 Yes, ZFS cannot delete files if the FS is completely FULL. I =
leave the =93do-not-remove=94 dataset unmounted so that it cannot be =
used.

Here is the config of my latest server (names changed to protect the =
guilty):

root@host1:~ # zpool status
  pool: rootpool
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	rootpool    ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada2p3  ONLINE       0     0     0
	    ada3p3  ONLINE       0     0     0

errors: No known data errors

  pool: vm-001
 state: ONLINE
  scan: none requested
config:

	NAME                             STATE     READ WRITE CKSUM
	vm-001                           ONLINE       0     0     0
	  mirror-0                       ONLINE       0     0     0
	    diskid/DISK-WD-WMAYP2681136  ONLINE       0     0     0
	    diskid/DISK-WD-WMAYP3653359  ONLINE       0     0     0

errors: No known data errors
root@host1:~ # zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rootpool                  35.0G   383G    19K  none
rootpool/ROOT             3.79G   383G    19K  none
rootpool/ROOT/2015-06-10     1K   383G  3.01G  /
rootpool/ROOT/default     3.79G   383G  3.08G  /
rootpool/do-not-remove      19K  1024M    19K  none
rootpool/software         18.6G   383G  18.6G  /software
rootpool/tmp              4.29G   383G  4.29G  /tmp
rootpool/usr              3.98G   383G    19K  /usr
rootpool/usr/home           19K   383G    19K  /usr/home
rootpool/usr/ports        3.63G   383G  3.63G  /usr/ports
rootpool/usr/src           361M   383G   359M  /usr/src
rootpool/var              3.20G   383G    19K  /var
rootpool/var/crash          19K   383G    19K  /var/crash
rootpool/var/log          38.5M   383G  1.19M  /var/log
rootpool/var/mail         42.5K   383G  30.5K  /var/mail
rootpool/var/tmp            19K   383G    19K  /var/tmp
rootpool/var/vbox         3.17G   383G  2.44G  /var/vbox
vm-001                     166G   283G    21K  /vm/local
vm-001/aaa-01             61.1G   283G  17.0G  /vm/local/aaa-01
vm-001/bbb-dev-01       20.8G   283G  13.1G  /vm/local/bbb-dev-01
vm-001/ccc-01        21.5K   283G  20.5K  /vm/local/ccc-01
vm-001/dev-01             4.10G   283G  3.19G  /vm/local/dev-01
vm-001/do-not-remove        19K  1024M    19K  none
vm-001/ddd-01          4.62G   283G  2.26G  /vm/local/ddd-01
vm-001/eee-dev-01         16.6G   283G  15.7G  /vm/local/eee-dev-01
vm-001/fff-01          7.44G   283G  3.79G  /vm/local/fff-01
vm-001/ggg-02           2.33G   283G  1.77G  /vm/local/ggg-02
vm-001/hhh-02             8.99G   283G  6.80G  /vm/local/hhh-02
vm-001/iii-repos          36.2G   283G  36.2G  /vm/local/iii-repos
vm-001/test-01            2.63G   283G  2.63G  /vm/local/test-01
vm-001/jjj-dev-01            19K   283G    19K  /vm/local/jjj-dev-01
root@host1:~ #=20


--
Paul Kraus
paul@kraus-haus.org




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7F08761C-556E-4147-95DB-E84B4E5179A5>