From owner-freebsd-questions@freebsd.org Thu Jul 9 14:32:48 2015 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7D80E996DB6 for ; Thu, 9 Jul 2015 14:32:48 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: from mail-qg0-f46.google.com (mail-qg0-f46.google.com [209.85.192.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3F3E315B0 for ; Thu, 9 Jul 2015 14:32:47 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: by qgep37 with SMTP id p37so23846007qge.1 for ; Thu, 09 Jul 2015 07:32:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to; bh=x4cqnP6AHRdTT664Fp+3yYeBBUbbC43fWrKD8a0oSSs=; b=Jov6GYkyuSD0HDcYdNLcLuPnwy1uYEfcXQk/rNcNnFRB/Q+eJ6qilTIfxv0QviK8SR 13Dt0J4+e+9xNrHKPFW/S1T/R4zG0DjjNVpSUCCJ8iVmUNlgYxTsZySGqTSXEeCmFhpT 0g+cUNTL/l6FdIsCFsPdYeHquZPBsX84dMAHsIfosxd27f8WvtqX/138ZiDxbObUA5n7 YJ45h9g2ky2GLKobebu5XPYlQLuEJrdIr14jgjjzxyzzY52arwyU1eQ07m6+Kp3SYay+ FUQY7HXfSPqSRp2+ah2apPk11DTr2BVStpnaECaiD3/QSbFvo8gHy+uDvKJ0TJL5nVTa wMdg== X-Gm-Message-State: ALoCoQmrDjjArA2xTHKyOKKk7WMYZWgyc8WLnj8di8eWyd+1adJWIgqSEdxrhH5wLzD7nlunn3bP X-Received: by 10.140.108.6 with SMTP id i6mr24566880qgf.73.1436452366743; Thu, 09 Jul 2015 07:32:46 -0700 (PDT) Received: from mbp-1.thecreativeadvantage.com (mail.thecreativeadvantage.com. [96.236.20.34]) by smtp.gmail.com with ESMTPSA id b133sm3610360qhc.40.2015.07.09.07.32.44 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 09 Jul 2015 07:32:45 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Gmirror/graid or hardware raid? From: Paul Kraus In-Reply-To: <917A821C-02F8-4F96-88DA-071E3431C335@mac.com> Date: Thu, 9 Jul 2015 10:32:45 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <7F08761C-556E-4147-95DB-E84B4E5179A5@kraus-haus.org> References: <917A821C-02F8-4F96-88DA-071E3431C335@mac.com> To: FreeBSD - X-Mailer: Apple Mail (2.1878.6) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 14:32:48 -0000 On Jul 8, 2015, at 17:21, Charles Swiger wrote: > On Jul 8, 2015, at 12:49 PM, Mario Lobo wrote: > Most of the PROD databases I know of working from local storage have = heaps of > RAID-1 mirrors, and sometimes larger volumes created as RAID-10 or = RAID-50. > Higher volume shops use dedicated SAN filers via redundant Fibre = Channel mesh > or similar for their database storage needs. Many years ago I had a client buy a couple racks FULL of trays of 36 GB = SCSI drives (yes, it was that long ago) and partition them so that they = only used the first 1 GB of each. This was purely for performance. They = were running a relatively large Oracle database and lots of OLTP = transactions. >=20 >> I thought about zfs but I won't have lots of RAM avaliable >=20 > ZFS wants to be run against bare metal. I've never seen anyone setup = ZFS within > a VM; it consumes far too much memory and it really wants to talk = directly to the > hardware for accurate error detection. ZFS runs fine in a VM and the notion that it _needs_ lots of RAM is = mostly false. I have run a FBSD Guest with ZFS and only 1 GB RAM. But=85 ZFS is designed first and foremost for data reliability and not = performance. It gets it=92s performance from striping across many vdevs = (the ZFS term for the top level device you assemble zpools out of), the = ARC (adaptive reuse cache), and Logging devices. Striping requires many = drives. The ARC uses any available RAM as a very aggressive FS cache. = The Log device improves sync writes by committing them to a dedicated = log device (usually a mirror of fast SSDs). I generally use ZFS for the Host (and because of my familiarity with = ZFS, I tend to use ZFS for all of the Host filesystems). Then I use UFS = for the Guests _unless_ I might need to migrate data in or out of a VM = or I need flexibility in partitioning (once you build a zpool, all zfs = datasets in it can grab as much or as little space as they need). I can = use zfs send / recv (even incrementally) to move data around quickly and = easily. I generally turn on compression for VM datasets (I set up one = zfs dataset per VM) as the CPU cost is noise and it dazes a bunch of = space (and reduces physical disk I/O which also improves performance). I = do NOT turn on compression in any ZFS inside a Guest as I am already = compressing at the Host layer. I also have a script that grabs a snapshot of every ZFS dataset every = hour and replicates them over to my backup server. Since ZFS snapshots = have no performance penalty, the only cost to keep them around is the = space used. This has proven to be a lifesaver when a Guest is corrupted, = I can easily and quickly roll it back to the most recent clean version. >=20 >> Should I use the controller raid? Gmirror/Graid? What raid level? >=20 > Level is easy: a 4-disk machine is suited for either a pair of = RAID-1s, a 4-disk RAID-10 volume, > or a 4-disk RAID-5 volume. For ZFS, the number of vdev=92s and the type will determine performance. = For a vdev of the following type you can expect the listed performance. = I am listing performance in terms of comparison to a single disk. N-way mirror: write 1x, read 1*n RaidZ: write 1x, read 1x minimum but variable Note that the performance of a RaidZ vdev does NOT scale with the number = of drives in the RAID set nor does it change with the Raid level (Z1, = Z2, Z3). So for example, a zpool consisting of 4 vdevs each a 2-way mirror will = have 4x the write performance of a single drive and 8x the read = performance. A zpool consisting of 2 vdevs each a RaidZ2 of 4 drives = will have the 2x the write performance of single drive and the read = performance will be a minimum of 2 x the performance of a single drive. = The variable read performance of RaidZ is because RaidZ does not always = write full strips across all the drives in the vdev. In other words, = RaidZ is a variable width Raid system. This has advantages and = disadvantages :-) Here is a good blog post that describes the RaidZ = stripe width http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/=20= I do NOT use RaidZ for anything except bulk backup data where capacity = is all that matters and performance is limited by lots of other factors. I also create a =93do-not-remove=94 dataset in every zpool with 1 GB = reserved and quota. ZFS behaves very, very badly when FULL. This give me = a cushion when things go badly so I can delete whatever used up all the = space =85 Yes, ZFS cannot delete files if the FS is completely FULL. I = leave the =93do-not-remove=94 dataset unmounted so that it cannot be = used. Here is the config of my latest server (names changed to protect the = guilty): root@host1:~ # zpool status pool: rootpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rootpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada2p3 ONLINE 0 0 0 ada3p3 ONLINE 0 0 0 errors: No known data errors pool: vm-001 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM vm-001 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 diskid/DISK-WD-WMAYP2681136 ONLINE 0 0 0 diskid/DISK-WD-WMAYP3653359 ONLINE 0 0 0 errors: No known data errors root@host1:~ # zfs list NAME USED AVAIL REFER MOUNTPOINT rootpool 35.0G 383G 19K none rootpool/ROOT 3.79G 383G 19K none rootpool/ROOT/2015-06-10 1K 383G 3.01G / rootpool/ROOT/default 3.79G 383G 3.08G / rootpool/do-not-remove 19K 1024M 19K none rootpool/software 18.6G 383G 18.6G /software rootpool/tmp 4.29G 383G 4.29G /tmp rootpool/usr 3.98G 383G 19K /usr rootpool/usr/home 19K 383G 19K /usr/home rootpool/usr/ports 3.63G 383G 3.63G /usr/ports rootpool/usr/src 361M 383G 359M /usr/src rootpool/var 3.20G 383G 19K /var rootpool/var/crash 19K 383G 19K /var/crash rootpool/var/log 38.5M 383G 1.19M /var/log rootpool/var/mail 42.5K 383G 30.5K /var/mail rootpool/var/tmp 19K 383G 19K /var/tmp rootpool/var/vbox 3.17G 383G 2.44G /var/vbox vm-001 166G 283G 21K /vm/local vm-001/aaa-01 61.1G 283G 17.0G /vm/local/aaa-01 vm-001/bbb-dev-01 20.8G 283G 13.1G /vm/local/bbb-dev-01 vm-001/ccc-01 21.5K 283G 20.5K /vm/local/ccc-01 vm-001/dev-01 4.10G 283G 3.19G /vm/local/dev-01 vm-001/do-not-remove 19K 1024M 19K none vm-001/ddd-01 4.62G 283G 2.26G /vm/local/ddd-01 vm-001/eee-dev-01 16.6G 283G 15.7G /vm/local/eee-dev-01 vm-001/fff-01 7.44G 283G 3.79G /vm/local/fff-01 vm-001/ggg-02 2.33G 283G 1.77G /vm/local/ggg-02 vm-001/hhh-02 8.99G 283G 6.80G /vm/local/hhh-02 vm-001/iii-repos 36.2G 283G 36.2G /vm/local/iii-repos vm-001/test-01 2.63G 283G 2.63G /vm/local/test-01 vm-001/jjj-dev-01 19K 283G 19K /vm/local/jjj-dev-01 root@host1:~ #=20 -- Paul Kraus paul@kraus-haus.org