Date: Thu, 9 Jul 2015 10:32:45 -0400 From: Paul Kraus <paul@kraus-haus.org> To: FreeBSD - <freebsd-questions@freebsd.org> Subject: Re: Gmirror/graid or hardware raid? Message-ID: <7F08761C-556E-4147-95DB-E84B4E5179A5@kraus-haus.org> In-Reply-To: <917A821C-02F8-4F96-88DA-071E3431C335@mac.com> References: <CA%2ByoEx-T5V3Rchxugke3%2BoUno6SwXHW1%2Bx466kWtb8VNYb%2BBbg@mail.gmail.com> <917A821C-02F8-4F96-88DA-071E3431C335@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 8, 2015, at 17:21, Charles Swiger <cswiger@mac.com> wrote: > On Jul 8, 2015, at 12:49 PM, Mario Lobo <lobo@bsd.com.br> wrote: <snip> > Most of the PROD databases I know of working from local storage have heaps of > RAID-1 mirrors, and sometimes larger volumes created as RAID-10 or RAID-50. > Higher volume shops use dedicated SAN filers via redundant Fibre Channel mesh > or similar for their database storage needs. Many years ago I had a client buy a couple racks FULL of trays of 36 GB SCSI drives (yes, it was that long ago) and partition them so that they only used the first 1 GB of each. This was purely for performance. They were running a relatively large Oracle database and lots of OLTP transactions. > >> I thought about zfs but I won't have lots of RAM avaliable > > ZFS wants to be run against bare metal. I've never seen anyone setup ZFS within > a VM; it consumes far too much memory and it really wants to talk directly to the > hardware for accurate error detection. ZFS runs fine in a VM and the notion that it _needs_ lots of RAM is mostly false. I have run a FBSD Guest with ZFS and only 1 GB RAM. But… ZFS is designed first and foremost for data reliability and not performance. It gets it’s performance from striping across many vdevs (the ZFS term for the top level device you assemble zpools out of), the ARC (adaptive reuse cache), and Logging devices. Striping requires many drives. The ARC uses any available RAM as a very aggressive FS cache. The Log device improves sync writes by committing them to a dedicated log device (usually a mirror of fast SSDs). I generally use ZFS for the Host (and because of my familiarity with ZFS, I tend to use ZFS for all of the Host filesystems). Then I use UFS for the Guests _unless_ I might need to migrate data in or out of a VM or I need flexibility in partitioning (once you build a zpool, all zfs datasets in it can grab as much or as little space as they need). I can use zfs send / recv (even incrementally) to move data around quickly and easily. I generally turn on compression for VM datasets (I set up one zfs dataset per VM) as the CPU cost is noise and it dazes a bunch of space (and reduces physical disk I/O which also improves performance). I do NOT turn on compression in any ZFS inside a Guest as I am already compressing at the Host layer. I also have a script that grabs a snapshot of every ZFS dataset every hour and replicates them over to my backup server. Since ZFS snapshots have no performance penalty, the only cost to keep them around is the space used. This has proven to be a lifesaver when a Guest is corrupted, I can easily and quickly roll it back to the most recent clean version. > >> Should I use the controller raid? Gmirror/Graid? What raid level? > > Level is easy: a 4-disk machine is suited for either a pair of RAID-1s, a 4-disk RAID-10 volume, > or a 4-disk RAID-5 volume. For ZFS, the number of vdev’s and the type will determine performance. For a vdev of the following type you can expect the listed performance. I am listing performance in terms of comparison to a single disk. N-way mirror: write 1x, read 1*n RaidZ: write 1x, read 1x minimum but variable Note that the performance of a RaidZ vdev does NOT scale with the number of drives in the RAID set nor does it change with the Raid level (Z1, Z2, Z3). So for example, a zpool consisting of 4 vdevs each a 2-way mirror will have 4x the write performance of a single drive and 8x the read performance. A zpool consisting of 2 vdevs each a RaidZ2 of 4 drives will have the 2x the write performance of single drive and the read performance will be a minimum of 2 x the performance of a single drive. The variable read performance of RaidZ is because RaidZ does not always write full strips across all the drives in the vdev. In other words, RaidZ is a variable width Raid system. This has advantages and disadvantages :-) Here is a good blog post that describes the RaidZ stripe width http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ I do NOT use RaidZ for anything except bulk backup data where capacity is all that matters and performance is limited by lots of other factors. I also create a “do-not-remove” dataset in every zpool with 1 GB reserved and quota. ZFS behaves very, very badly when FULL. This give me a cushion when things go badly so I can delete whatever used up all the space … Yes, ZFS cannot delete files if the FS is completely FULL. I leave the “do-not-remove” dataset unmounted so that it cannot be used. Here is the config of my latest server (names changed to protect the guilty): root@host1:~ # zpool status pool: rootpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rootpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada2p3 ONLINE 0 0 0 ada3p3 ONLINE 0 0 0 errors: No known data errors pool: vm-001 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM vm-001 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 diskid/DISK-WD-WMAYP2681136 ONLINE 0 0 0 diskid/DISK-WD-WMAYP3653359 ONLINE 0 0 0 errors: No known data errors root@host1:~ # zfs list NAME USED AVAIL REFER MOUNTPOINT rootpool 35.0G 383G 19K none rootpool/ROOT 3.79G 383G 19K none rootpool/ROOT/2015-06-10 1K 383G 3.01G / rootpool/ROOT/default 3.79G 383G 3.08G / rootpool/do-not-remove 19K 1024M 19K none rootpool/software 18.6G 383G 18.6G /software rootpool/tmp 4.29G 383G 4.29G /tmp rootpool/usr 3.98G 383G 19K /usr rootpool/usr/home 19K 383G 19K /usr/home rootpool/usr/ports 3.63G 383G 3.63G /usr/ports rootpool/usr/src 361M 383G 359M /usr/src rootpool/var 3.20G 383G 19K /var rootpool/var/crash 19K 383G 19K /var/crash rootpool/var/log 38.5M 383G 1.19M /var/log rootpool/var/mail 42.5K 383G 30.5K /var/mail rootpool/var/tmp 19K 383G 19K /var/tmp rootpool/var/vbox 3.17G 383G 2.44G /var/vbox vm-001 166G 283G 21K /vm/local vm-001/aaa-01 61.1G 283G 17.0G /vm/local/aaa-01 vm-001/bbb-dev-01 20.8G 283G 13.1G /vm/local/bbb-dev-01 vm-001/ccc-01 21.5K 283G 20.5K /vm/local/ccc-01 vm-001/dev-01 4.10G 283G 3.19G /vm/local/dev-01 vm-001/do-not-remove 19K 1024M 19K none vm-001/ddd-01 4.62G 283G 2.26G /vm/local/ddd-01 vm-001/eee-dev-01 16.6G 283G 15.7G /vm/local/eee-dev-01 vm-001/fff-01 7.44G 283G 3.79G /vm/local/fff-01 vm-001/ggg-02 2.33G 283G 1.77G /vm/local/ggg-02 vm-001/hhh-02 8.99G 283G 6.80G /vm/local/hhh-02 vm-001/iii-repos 36.2G 283G 36.2G /vm/local/iii-repos vm-001/test-01 2.63G 283G 2.63G /vm/local/test-01 vm-001/jjj-dev-01 19K 283G 19K /vm/local/jjj-dev-01 root@host1:~ # -- Paul Kraus paul@kraus-haus.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7F08761C-556E-4147-95DB-E84B4E5179A5>
