Date: Thu, 9 Jul 2015 10:32:45 -0400 From: Paul Kraus <paul@kraus-haus.org> To: FreeBSD - <freebsd-questions@freebsd.org> Subject: Re: Gmirror/graid or hardware raid? Message-ID: <7F08761C-556E-4147-95DB-E84B4E5179A5@kraus-haus.org> In-Reply-To: <917A821C-02F8-4F96-88DA-071E3431C335@mac.com> References: <CA%2ByoEx-T5V3Rchxugke3%2BoUno6SwXHW1%2Bx466kWtb8VNYb%2BBbg@mail.gmail.com> <917A821C-02F8-4F96-88DA-071E3431C335@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 8, 2015, at 17:21, Charles Swiger <cswiger@mac.com> wrote: > On Jul 8, 2015, at 12:49 PM, Mario Lobo <lobo@bsd.com.br> wrote: <snip> > Most of the PROD databases I know of working from local storage have = heaps of > RAID-1 mirrors, and sometimes larger volumes created as RAID-10 or = RAID-50. > Higher volume shops use dedicated SAN filers via redundant Fibre = Channel mesh > or similar for their database storage needs. Many years ago I had a client buy a couple racks FULL of trays of 36 GB = SCSI drives (yes, it was that long ago) and partition them so that they = only used the first 1 GB of each. This was purely for performance. They = were running a relatively large Oracle database and lots of OLTP = transactions. >=20 >> I thought about zfs but I won't have lots of RAM avaliable >=20 > ZFS wants to be run against bare metal. I've never seen anyone setup = ZFS within > a VM; it consumes far too much memory and it really wants to talk = directly to the > hardware for accurate error detection. ZFS runs fine in a VM and the notion that it _needs_ lots of RAM is = mostly false. I have run a FBSD Guest with ZFS and only 1 GB RAM. But=85 ZFS is designed first and foremost for data reliability and not = performance. It gets it=92s performance from striping across many vdevs = (the ZFS term for the top level device you assemble zpools out of), the = ARC (adaptive reuse cache), and Logging devices. Striping requires many = drives. The ARC uses any available RAM as a very aggressive FS cache. = The Log device improves sync writes by committing them to a dedicated = log device (usually a mirror of fast SSDs). I generally use ZFS for the Host (and because of my familiarity with = ZFS, I tend to use ZFS for all of the Host filesystems). Then I use UFS = for the Guests _unless_ I might need to migrate data in or out of a VM = or I need flexibility in partitioning (once you build a zpool, all zfs = datasets in it can grab as much or as little space as they need). I can = use zfs send / recv (even incrementally) to move data around quickly and = easily. I generally turn on compression for VM datasets (I set up one = zfs dataset per VM) as the CPU cost is noise and it dazes a bunch of = space (and reduces physical disk I/O which also improves performance). I = do NOT turn on compression in any ZFS inside a Guest as I am already = compressing at the Host layer. I also have a script that grabs a snapshot of every ZFS dataset every = hour and replicates them over to my backup server. Since ZFS snapshots = have no performance penalty, the only cost to keep them around is the = space used. This has proven to be a lifesaver when a Guest is corrupted, = I can easily and quickly roll it back to the most recent clean version. >=20 >> Should I use the controller raid? Gmirror/Graid? What raid level? >=20 > Level is easy: a 4-disk machine is suited for either a pair of = RAID-1s, a 4-disk RAID-10 volume, > or a 4-disk RAID-5 volume. For ZFS, the number of vdev=92s and the type will determine performance. = For a vdev of the following type you can expect the listed performance. = I am listing performance in terms of comparison to a single disk. N-way mirror: write 1x, read 1*n RaidZ: write 1x, read 1x minimum but variable Note that the performance of a RaidZ vdev does NOT scale with the number = of drives in the RAID set nor does it change with the Raid level (Z1, = Z2, Z3). So for example, a zpool consisting of 4 vdevs each a 2-way mirror will = have 4x the write performance of a single drive and 8x the read = performance. A zpool consisting of 2 vdevs each a RaidZ2 of 4 drives = will have the 2x the write performance of single drive and the read = performance will be a minimum of 2 x the performance of a single drive. = The variable read performance of RaidZ is because RaidZ does not always = write full strips across all the drives in the vdev. In other words, = RaidZ is a variable width Raid system. This has advantages and = disadvantages :-) Here is a good blog post that describes the RaidZ = stripe width http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/=20= I do NOT use RaidZ for anything except bulk backup data where capacity = is all that matters and performance is limited by lots of other factors. I also create a =93do-not-remove=94 dataset in every zpool with 1 GB = reserved and quota. ZFS behaves very, very badly when FULL. This give me = a cushion when things go badly so I can delete whatever used up all the = space =85 Yes, ZFS cannot delete files if the FS is completely FULL. I = leave the =93do-not-remove=94 dataset unmounted so that it cannot be = used. Here is the config of my latest server (names changed to protect the = guilty): root@host1:~ # zpool status pool: rootpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rootpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada2p3 ONLINE 0 0 0 ada3p3 ONLINE 0 0 0 errors: No known data errors pool: vm-001 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM vm-001 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 diskid/DISK-WD-WMAYP2681136 ONLINE 0 0 0 diskid/DISK-WD-WMAYP3653359 ONLINE 0 0 0 errors: No known data errors root@host1:~ # zfs list NAME USED AVAIL REFER MOUNTPOINT rootpool 35.0G 383G 19K none rootpool/ROOT 3.79G 383G 19K none rootpool/ROOT/2015-06-10 1K 383G 3.01G / rootpool/ROOT/default 3.79G 383G 3.08G / rootpool/do-not-remove 19K 1024M 19K none rootpool/software 18.6G 383G 18.6G /software rootpool/tmp 4.29G 383G 4.29G /tmp rootpool/usr 3.98G 383G 19K /usr rootpool/usr/home 19K 383G 19K /usr/home rootpool/usr/ports 3.63G 383G 3.63G /usr/ports rootpool/usr/src 361M 383G 359M /usr/src rootpool/var 3.20G 383G 19K /var rootpool/var/crash 19K 383G 19K /var/crash rootpool/var/log 38.5M 383G 1.19M /var/log rootpool/var/mail 42.5K 383G 30.5K /var/mail rootpool/var/tmp 19K 383G 19K /var/tmp rootpool/var/vbox 3.17G 383G 2.44G /var/vbox vm-001 166G 283G 21K /vm/local vm-001/aaa-01 61.1G 283G 17.0G /vm/local/aaa-01 vm-001/bbb-dev-01 20.8G 283G 13.1G /vm/local/bbb-dev-01 vm-001/ccc-01 21.5K 283G 20.5K /vm/local/ccc-01 vm-001/dev-01 4.10G 283G 3.19G /vm/local/dev-01 vm-001/do-not-remove 19K 1024M 19K none vm-001/ddd-01 4.62G 283G 2.26G /vm/local/ddd-01 vm-001/eee-dev-01 16.6G 283G 15.7G /vm/local/eee-dev-01 vm-001/fff-01 7.44G 283G 3.79G /vm/local/fff-01 vm-001/ggg-02 2.33G 283G 1.77G /vm/local/ggg-02 vm-001/hhh-02 8.99G 283G 6.80G /vm/local/hhh-02 vm-001/iii-repos 36.2G 283G 36.2G /vm/local/iii-repos vm-001/test-01 2.63G 283G 2.63G /vm/local/test-01 vm-001/jjj-dev-01 19K 283G 19K /vm/local/jjj-dev-01 root@host1:~ #=20 -- Paul Kraus paul@kraus-haus.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7F08761C-556E-4147-95DB-E84B4E5179A5>