From owner-freebsd-performance@FreeBSD.ORG Fri Nov 18 12:19:09 2011 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCCCE106564A for ; Fri, 18 Nov 2011 12:19:08 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 95A808FC16 for ; Fri, 18 Nov 2011 12:19:08 +0000 (UTC) Received: by vcbfl11 with SMTP id fl11so433843vcb.13 for ; Fri, 18 Nov 2011 04:19:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=Zdr2xwV7BvFzj0ne6mbvsA+aVC5YAjZt+c+F8JlwWPk=; b=qwZ8GadiE9nNEXU91IlP6n7ChITDhjIzPhDCIaQ32zJm2PJHRIlpZbUdcrxbbeV5S1 zVFj+0cckmBRgh7JevrkQ/f/ChRJiWylrZaiPMA4GIEmJ5GZvM7HGipqo9iGO9xZrXKn RkbhMXVCb3QdkjMFE+HsJm3Z+CzbakkYD55Xg= MIME-Version: 1.0 Received: by 10.52.93.146 with SMTP id cu18mr3159069vdb.56.1321618747834; Fri, 18 Nov 2011 04:19:07 -0800 (PST) Received: by 10.52.182.40 with HTTP; Fri, 18 Nov 2011 04:19:07 -0800 (PST) In-Reply-To: <4EC57E62.9000007@infracaninophile.co.uk> References: <4EC57E62.9000007@infracaninophile.co.uk> Date: Fri, 18 Nov 2011 12:19:07 +0000 Message-ID: From: Tom Evans To: Matthew Seaman Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-performance@freebsd.org Subject: Re: ZFS Few Questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Nov 2011 12:19:09 -0000 On Thu, Nov 17, 2011 at 9:36 PM, Matthew Seaman wrote: > On 17/11/2011 19:04, Mark Felder wrote: >>> Question 3: >>> Anyone Recommend for MySQL server? (Performance) >> >> No idea; I haven't run any SQL servers on ZFS > > The sort of randomly located small IOs that RDBMSes do is the hardest > sort of IO pattern for ZFS (or any filesystem for that matter) to > manage. =C2=A0ZFS has a particular problem in that its default storage un= it > is a 128kB block -- and the copy-on-write semantics mean that the > filesystem layer can in principle end up doing a 128kB read, altering a > few bytes, then doing a 128kB write to get that data back on disk. > > You can get pretty reasonable DB performance on ZFS, but it takes quite > a bit of tuning. > > =C2=A0 * ZFS needs plenty of RAM. =C2=A0The DB needs plenty of RAM. =C2= =A0Exactly > =C2=A0 =C2=A0 what the balance should be is hard to predict -- dependent = on > =C2=A0 =C2=A0 specific workloads -- so expect to spend some time benchmar= king > =C2=A0 =C2=A0 and experimenting with different settings. > > =C2=A0 * Putting the ARC (Adaptive Replacement Cache) on a separate, fast > =C2=A0 =C2=A0 device will make a big difference to performance. =C2=A0SSD= cards are > =C2=A0 =C2=A0 popular for this purpose. =C2=A0(Be aware though that SSDs = have a > =C2=A0 =C2=A0 limited lifetime, and tend to fail suddenly and completely = when > =C2=A0 =C2=A0 they do wear out. =C2=A0You will need multiple layers of re= silience and > =C2=A0 =C2=A0 very good backups...) =C2=A0While SSD cards are intrinsical= ly faster > =C2=A0 =C2=A0 than individual rotating magnetic media, they are no match = for a > =C2=A0 =C2=A0 large disk array that can spread the IO over lots of spindl= es. > =C2=A0 =C2=A0 But that costs a very great deal of money... > > =C2=A0 * Reducing the ZFS block size (the recordsize property when creati= ng > =C2=A0 =C2=A0 a zfs) to match the IO size of your DB system can help a lo= t. =C2=A0Do > =C2=A0 =C2=A0 this before creating the database. > > =C2=A0 * Separating the DB's data and transaction logging onto separate Z= FS > =C2=A0 =C2=A0 pools helps. > > See http://www.solarisinternals.com/wiki/index.php/ZFS_for_Databases for > more details. =C2=A0Just about everything on that page applies equally to > FreeBSD as it does to Solaris. > > =C2=A0 =C2=A0 =C2=A0 =C2=A0Cheers, > > =C2=A0 =C2=A0 =C2=A0 =C2=A0Matthew > If you are running a write heavy database, in addition to what Matthew has said, you will definitely want a separate ZIL from your pool. To speed up reads, you will want to allocate as much to ARC as you can spare from your applications. L1 ARC is RAM; set vfs.zfs.arc_max in loader.conf to control the maximum amount of RAM you want to use. L2 ARC is optional, to add it you need to add cache devices to your pool. You can lose the L2 ARC from the pool without side effects, so just add some ssds like so: zpool tank add cache ada0 ada2 To speed up synchronous writes, you need to add a dedicated ZFS Intent Log (ZIL). If you don't specify a separate ZIL, then part of the pool is used to be the ZIL. Some versions of ZFS would complain loudly (panic) if the ZIL disappeared, I think in 9.0 it does not, but you should use a mirror anyway: zpool tank add log mirror ada1 ada3 Rather than adding extra drives, you can use PCIe SSD plugin cards, which are super fast. The ones we use present two drives per device rather than one, we put two cards in each machine, and we use one drive on each device for L2ARC and ZIL. It's only in testing so far - we're waiting for 8.3 to be released - but it works nicely. Cheers Tom