From owner-freebsd-fs@FreeBSD.ORG Sat Jan 21 15:45:54 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE1041065673 for ; Sat, 21 Jan 2012 15:45:54 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 67E618FC12 for ; Sat, 21 Jan 2012 15:45:54 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC43C3D.dip.t-dialin.net [79.196.60.61]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id CD6DC844017; Sat, 21 Jan 2012 16:29:10 +0100 (CET) Received: from unknown (IO.Leidinger.net [192.168.1.12]) by outgoing.leidinger.net (Postfix) with ESMTP id 1533614DF; Sat, 21 Jan 2012 16:29:08 +0100 (CET) Date: Sat, 21 Jan 2012 16:29:06 +0100 From: Alexander Leidinger To: Willem Jan Withagen Message-ID: <20120121162906.0000518c@unknown> In-Reply-To: <4F193D90.9020703@digiware.nl> References: <4F193D90.9020703@digiware.nl> X-Mailer: Claws Mail 3.7.10cvs42 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: CD6DC844017.A16A1 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-0.923, required 6, autolearn=disabled, ALL_TRUSTED -1.00, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1327764551.56554@HZOyKLCGwxL7zlm1ZLFH2w X-EBL-Spam-Status: No Cc: fs@freebsd.org Subject: Re: Question about ZFS with log and cache on SSD with GPT X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Jan 2012 15:45:54 -0000 On Fri, 20 Jan 2012 11:10:24 +0100 Willem Jan Withagen wrote: > Now my question is more about the SSD configuration. > (BTW adding 1 SSD got the insert rate up from 100/sec to > 1000/sec, > once the cache was loaded.) > > The database is on a mirror of 2 1T disks: > ada0: ATA-8 SATA 3.x device > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > > and there are 2 SSDs: > ada2: ATA-8 SATA 2.x device > ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada2: Command Queueing enabled > > What I've currently done is partition all disks (also the SSDs) with > GPT like below: > batman# zpool iostat -v > capacity operations bandwidth > pool alloc free read write read write > ------------- ----- ----- ----- ----- ----- ----- > zfsboot 50.0G 49.5G 1 13 46.0K 164K > mirror 50.0G 49.5G 1 13 46.0K 164K > gpt/boot4 - - 0 5 23.0K 164K > gpt/boot6 - - 0 5 22.9K 164K > ------------- ----- ----- ----- ----- ----- ----- > zfsdata 59.4G 765G 12 62 250K 1.30M > mirror 59.4G 765G 12 62 250K 1.30M > gpt/data4 - - 5 15 127K 1.30M > gpt/data6 - - 5 15 127K 1.30M > gpt/log2 11M 1005M 0 22 12 653K > gpt/log3 11.1M 1005M 0 22 12 652K Do you have two log devices in non-mirrored mode? If yes, it would be better to have the ZIL mirrored on a pair. > cache - - - - - - > gpt/cache2 9.99G 26.3G 27 53 1.20M 5.30M > gpt/cache3 9.85G 26.4G 28 54 1.24M 5.23M > ------------- ----- ----- ----- ----- ----- ----- > > disks 4 and 6 are naming remains of pre ahci times and are ada0 and > ada1. So the hardisks have the "std" zfs setup: a boot-pool and a > data-pool. > > The SSD's if partitioned and assigned to zfsdata with: > gpart create -s GPT ada2 > gpart create -s GPT ada3 > gpart add -t freebsd-zfs -l log2 -s 1G ada2 > gpart add -t freebsd-zfs -l log3 -s 1G ada3 > gpart add -t freebsd-zfs -l cache2 ada2 > gpart add -t freebsd-zfs -l cache3 ada3 > zpool add zfsdata log /dev/gpt/log* > zpool add zfsdata cache /dev/gpt/cache* > > Now the question would be are the GPT partitions correctly aligned to > give optimal performance? I would assume that the native block size of the flash is more like 4kb than 512b. As such just creating the GPT partitions will not be the best setup. See http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/ for a description how to align to 4k sectors. I do not know if the main devices of the pool need to be setup with an emulated 4k size (the gnop part in my description) or not, but I would assume all disks in the pool needs to be setup with the temporary gnop setup. > The harddisks are still std 512byte sectors, so that would be alright? > The SSD's I have my doubts..... You could assume that the majority of cases are 4k or bigger writes (tune your MySQL this way, and do not forget to change the recordsize of the zfs dataset which contains the db files to match what the DB writes) and just align the partitions of the SSDs for 4k (do not use the gnop part in my description). I would assume that this already gives good performance in most cases. > Good thing is that v28 allow you to toy with log and cache without > loosing data. So I could redo the recreation of cache and log > relatively easy. You can still lose data when a log SSD dies (if they are not mirrored). > I'd rather not redo the DB build since that takes a few days. :( > But before loading the DB, I did use some of the tuning suggestions > like using different recordsize for db-logs and innodb files. Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137