From owner-freebsd-fs@FreeBSD.ORG Sat Jan 21 22:06:35 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C15F7106566B for ; Sat, 21 Jan 2012 22:06:35 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 6A4938FC0C for ; Sat, 21 Jan 2012 22:06:35 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC43C3D.dip.t-dialin.net [79.196.60.61]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 9135C844017; Sat, 21 Jan 2012 23:06:21 +0100 (CET) Received: from unknown (IO.Leidinger.net [192.168.1.12]) by outgoing.leidinger.net (Postfix) with ESMTP id CF4C31509; Sat, 21 Jan 2012 23:06:18 +0100 (CET) Date: Sat, 21 Jan 2012 23:06:16 +0100 From: Alexander Leidinger To: Willem Jan Withagen Message-ID: <20120121230616.00006267@unknown> In-Reply-To: <4F1B0177.8080909@digiware.nl> References: <4F193D90.9020703@digiware.nl> <20120121162906.0000518c@unknown> <4F1B0177.8080909@digiware.nl> X-Mailer: Claws Mail 3.7.10cvs42 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 9135C844017.AFA81 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-0.923, required 6, autolearn=disabled, ALL_TRUSTED -1.00, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1327788382.37017@zxd58SSEhbYufFWfL3QI6w X-EBL-Spam-Status: No Cc: fs@freebsd.org Subject: Re: Question about ZFS with log and cache on SSD with GPT X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Jan 2012 22:06:35 -0000 On Sat, 21 Jan 2012 19:18:31 +0100 Willem Jan Withagen wrote: > On 21-1-2012 16:29, Alexander Leidinger wrote: > >> What I've currently done is partition all disks (also the SSDs) > >> with GPT like below: > >> batman# zpool iostat -v > >> capacity operations bandwidth > >> pool alloc free read write read write > >> ------------- ----- ----- ----- ----- ----- ----- > >> zfsboot 50.0G 49.5G 1 13 46.0K 164K > >> mirror 50.0G 49.5G 1 13 46.0K 164K > >> gpt/boot4 - - 0 5 23.0K 164K > >> gpt/boot6 - - 0 5 22.9K 164K > >> ------------- ----- ----- ----- ----- ----- ----- > >> zfsdata 59.4G 765G 12 62 250K 1.30M > >> mirror 59.4G 765G 12 62 250K 1.30M > >> gpt/data4 - - 5 15 127K 1.30M > >> gpt/data6 - - 5 15 127K 1.30M > >> gpt/log2 11M 1005M 0 22 12 653K > >> gpt/log3 11.1M 1005M 0 22 12 652K > > > > Do you have two log devices in non-mirrored mode? If yes, it would > > be better to have the ZIL mirrored on a pair. > > So what you are saying is that logging is faster in mirrored mode? No. > Our are you more concerned out losing the the LOG en thus possible > losing data. Yes. If one piece of the involved hardware dies, you lose data. > >> cache - - - - - - > >> gpt/cache2 9.99G 26.3G 27 53 1.20M 5.30M > >> gpt/cache3 9.85G 26.4G 28 54 1.24M 5.23M > >> ------------- ----- ----- ----- ----- ----- ----- > .... > > >> Now the question would be are the GPT partitions correctly aligned > >> to give optimal performance? > > > > I would assume that the native block size of the flash is more like > > 4kb than 512b. As such just creating the GPT partitions will not be > > the best setup. > > Corsair reports: > Max Random 4k Write (using IOMeter 08): 50k IOPS (4k aligned) > So I guess that suggests 4k aligned is required. Sounds like it is. > > See > > http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/ > > for a description how to align to 4k sectors. I do not know if the > > main devices of the pool need to be setup with an emulated 4k size > > (the gnop part in my description) or not, but I would assume all > > disks in the pool needs to be setup with the temporary gnop setup. > > Well one way of resetting up the harddisks would be to remove them > from the mirror each in turn. Repartion, and then rebuild the mirror, > hoping that that would work, since I need some extra space to move the > partitions up. :( Already answered by someone else, but I want to point out again, that if you have the critical writes 4k aligned and they are mostly 4k or bigger in size, you could be lucky. You could compare the zpool iostat output with the gstat output of the disks. If they more or less match, you are lucky. If the gstat output is bigger, you are in the unlucky case. > >> The harddisks are still std 512byte sectors, so that would be > >> alright? The SSD's I have my doubts..... > > > > You could assume that the majority of cases are 4k or bigger writes > > (tune your MySQL this way, and do not forget to change the > > recordsize of the zfs dataset which contains the db files to match > > what the DB writes) and just align the partitions of the SSDs for > > 4k (do not use the gnop part in my description). I would assume > > that this already gives good performance in most cases. > > I'll redo the SSD's with the suggestions from your page. > > >> Good thing is that v28 allow you to toy with log and cache without > >> loosing data. So I could redo the recreation of cache and log > >> relatively easy. > > > > You can still lose data when a log SSD dies (if they are not > > mirrored). > > I was more refering to the fact that under v28, one is able to remove > log and cache thru zpool commands without loosing data. Just pulling > the disks is of course going to corrupt data. If you can recreate the data and don't care about data loss, and if you verified that two ZIL devices give more performance than two, why not. Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137