From owner-freebsd-fs@FreeBSD.ORG  Sat Jan 21 18:18:32 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 49BF01065672
	for <fs@freebsd.org>; Sat, 21 Jan 2012 18:18:32 +0000 (UTC)
	(envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id BC5D98FC0A
	for <fs@freebsd.org>; Sat, 21 Jan 2012 18:18:31 +0000 (UTC)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id E51C1153434;
	Sat, 21 Jan 2012 19:18:29 +0100 (CET)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id c-OI8xr3YNG3; Sat, 21 Jan 2012 19:18:28 +0100 (CET)
Received: from [192.168.10.10] (vaio [192.168.10.10])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.digiware.nl (Postfix) with ESMTPSA id A3BA5153433;
	Sat, 21 Jan 2012 19:18:28 +0100 (CET)
Message-ID: <4F1B0177.8080909@digiware.nl>
Date: Sat, 21 Jan 2012 19:18:31 +0100
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:9.0) Gecko/20111222 Thunderbird/9.0.1
MIME-Version: 1.0
To: Alexander Leidinger <Alexander@Leidinger.net>
References: <4F193D90.9020703@digiware.nl> <20120121162906.0000518c@unknown>
In-Reply-To: <20120121162906.0000518c@unknown>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: fs@freebsd.org
Subject: Re: Question about  ZFS with log and cache on SSD with GPT
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 21 Jan 2012 18:18:32 -0000

On 21-1-2012 16:29, Alexander Leidinger wrote:
>> What I've currently done is partition all disks (also the SSDs) with
>> GPT like below:
>> batman# zpool iostat -v
>>                   capacity     operations    bandwidth
>> pool           alloc   free   read  write   read  write
>> -------------  -----  -----  -----  -----  -----  -----
>> zfsboot        50.0G  49.5G      1     13  46.0K   164K
>>   mirror       50.0G  49.5G      1     13  46.0K   164K
>>     gpt/boot4      -      -      0      5  23.0K   164K
>>     gpt/boot6      -      -      0      5  22.9K   164K
>> -------------  -----  -----  -----  -----  -----  -----
>> zfsdata        59.4G   765G     12     62   250K  1.30M
>>   mirror       59.4G   765G     12     62   250K  1.30M
>>     gpt/data4      -      -      5     15   127K  1.30M
>>     gpt/data6      -      -      5     15   127K  1.30M
>>   gpt/log2       11M  1005M      0     22     12   653K
>>   gpt/log3     11.1M  1005M      0     22     12   652K
> 
> Do you have two log devices in non-mirrored mode? If yes, it would be
> better to have the ZIL mirrored on a pair.

So what you are saying is that logging is faster in mirrored mode?
Our are you more concerned out losing the the LOG en thus possible
losing data.

>> cache              -      -      -      -      -      -
>>   gpt/cache2   9.99G  26.3G     27     53  1.20M  5.30M
>>   gpt/cache3   9.85G  26.4G     28     54  1.24M  5.23M
>> -------------  -----  -----  -----  -----  -----  -----
....

>> Now the question would be are the GPT partitions correctly aligned to
>> give optimal performance?
> 
> I would assume that the native block size of the flash is more like 4kb
> than 512b. As such just creating the GPT partitions will not be the
> best setup. 

Corsair reports:
Max Random 4k Write (using IOMeter 08): 50k IOPS (4k aligned)
So I guess that suggests 4k aligned is required.

> See
> http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/
> for a description how to align to 4k sectors. I do not know if the main
> devices of the pool need to be setup with an emulated 4k size (the gnop
> part in my description) or not, but I would assume all disks in the
> pool needs to be setup with the temporary gnop setup.

Well one way of resetting up the harddisks would be to remove them from
the mirror each in turn. Repartion, and then rebuild the mirror, hoping
that that would work, since I need some extra space to move the
partitions up. :(

>> The harddisks are still std 512byte sectors, so that would be alright?
>> The SSD's I have my doubts.....
> 
> You could assume that the majority of cases are 4k or bigger writes
> (tune your MySQL this way, and do not forget to change the recordsize
> of the zfs dataset which contains the db files to match what the DB
> writes) and just align the partitions of the SSDs for 4k (do not use
> the gnop part in my description). I would assume that this already
> gives good performance in most cases.

I'll redo the SSD's with the suggestions from your page.

>> Good thing is that v28 allow you to toy with log and cache without
>> loosing data. So I could redo the recreation of cache and log
>> relatively easy.
> 
> You can still lose data when a log SSD dies (if they are not mirrored).

I was more refering to the fact that under v28, one is able to remove
log and cache thru zpool commands without loosing data. Just pulling the
disks is of course going to corrupt data.

--WjW