From owner-freebsd-fs@FreeBSD.ORG Sat Jan 21 18:18:32 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49BF01065672 for ; Sat, 21 Jan 2012 18:18:32 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id BC5D98FC0A for ; Sat, 21 Jan 2012 18:18:31 +0000 (UTC) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id E51C1153434; Sat, 21 Jan 2012 19:18:29 +0100 (CET) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c-OI8xr3YNG3; Sat, 21 Jan 2012 19:18:28 +0100 (CET) Received: from [192.168.10.10] (vaio [192.168.10.10]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id A3BA5153433; Sat, 21 Jan 2012 19:18:28 +0100 (CET) Message-ID: <4F1B0177.8080909@digiware.nl> Date: Sat, 21 Jan 2012 19:18:31 +0100 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 To: Alexander Leidinger References: <4F193D90.9020703@digiware.nl> <20120121162906.0000518c@unknown> In-Reply-To: <20120121162906.0000518c@unknown> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: fs@freebsd.org Subject: Re: Question about ZFS with log and cache on SSD with GPT X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Jan 2012 18:18:32 -0000 On 21-1-2012 16:29, Alexander Leidinger wrote: >> What I've currently done is partition all disks (also the SSDs) with >> GPT like below: >> batman# zpool iostat -v >> capacity operations bandwidth >> pool alloc free read write read write >> ------------- ----- ----- ----- ----- ----- ----- >> zfsboot 50.0G 49.5G 1 13 46.0K 164K >> mirror 50.0G 49.5G 1 13 46.0K 164K >> gpt/boot4 - - 0 5 23.0K 164K >> gpt/boot6 - - 0 5 22.9K 164K >> ------------- ----- ----- ----- ----- ----- ----- >> zfsdata 59.4G 765G 12 62 250K 1.30M >> mirror 59.4G 765G 12 62 250K 1.30M >> gpt/data4 - - 5 15 127K 1.30M >> gpt/data6 - - 5 15 127K 1.30M >> gpt/log2 11M 1005M 0 22 12 653K >> gpt/log3 11.1M 1005M 0 22 12 652K > > Do you have two log devices in non-mirrored mode? If yes, it would be > better to have the ZIL mirrored on a pair. So what you are saying is that logging is faster in mirrored mode? Our are you more concerned out losing the the LOG en thus possible losing data. >> cache - - - - - - >> gpt/cache2 9.99G 26.3G 27 53 1.20M 5.30M >> gpt/cache3 9.85G 26.4G 28 54 1.24M 5.23M >> ------------- ----- ----- ----- ----- ----- ----- .... >> Now the question would be are the GPT partitions correctly aligned to >> give optimal performance? > > I would assume that the native block size of the flash is more like 4kb > than 512b. As such just creating the GPT partitions will not be the > best setup. Corsair reports: Max Random 4k Write (using IOMeter 08): 50k IOPS (4k aligned) So I guess that suggests 4k aligned is required. > See > http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/ > for a description how to align to 4k sectors. I do not know if the main > devices of the pool need to be setup with an emulated 4k size (the gnop > part in my description) or not, but I would assume all disks in the > pool needs to be setup with the temporary gnop setup. Well one way of resetting up the harddisks would be to remove them from the mirror each in turn. Repartion, and then rebuild the mirror, hoping that that would work, since I need some extra space to move the partitions up. :( >> The harddisks are still std 512byte sectors, so that would be alright? >> The SSD's I have my doubts..... > > You could assume that the majority of cases are 4k or bigger writes > (tune your MySQL this way, and do not forget to change the recordsize > of the zfs dataset which contains the db files to match what the DB > writes) and just align the partitions of the SSDs for 4k (do not use > the gnop part in my description). I would assume that this already > gives good performance in most cases. I'll redo the SSD's with the suggestions from your page. >> Good thing is that v28 allow you to toy with log and cache without >> loosing data. So I could redo the recreation of cache and log >> relatively easy. > > You can still lose data when a log SSD dies (if they are not mirrored). I was more refering to the fact that under v28, one is able to remove log and cache thru zpool commands without loosing data. Just pulling the disks is of course going to corrupt data. --WjW