Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Jun 2014 22:32:44 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        mahrens@delphix.com
Cc:        freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org, nwhitehorn@FreeBSD.org
Subject:   Re: fdisk(8) vs gpart(8), and gnop
Message-ID:  <201406020532.s525Wiqn020165@gw.catspoiler.org>
In-Reply-To: <CAJjvXiFAX7N-30g0OZ6idqLnyJww5dsyhGfLj6nYwKs9Xp--1g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On  1 Jun, Matthew Ahrens wrote:
> On Sun, Jun 1, 2014 at 9:07 AM, Nathan Whitehorn <nwhitehorn@freebsd.org>
> wrote:
> 
>> On 06/01/14 09:00, Steven Hartland wrote:
>>
>>>
>>> ----- Original Message ----- From: "Nathan Whitehorn" <
>>> nwhitehorn@freebsd.org>
>>> To: <freebsd-hackers@freebsd.org>; <freebsd-fs@freebsd.org>
>>> Sent: Sunday, June 01, 2014 4:55 PM
>>> Subject: Re: fdisk(8) vs gpart(8), and gnop
>>>
>>>
>>>  On 06/01/14 08:52, Steven Hartland wrote:
>>>>
>>>>> ----- Original Message ----- From: "Mark Felder" <feld@freebsd.org>
>>>>>
>>>>>  On May 31, 2014, at 20:57, Freddie Cash <fjwcash@gmail.com> wrote:
>>>>>>
>>>>>>  There's a sysctl where you can set the minimum ashift for zfs. Then
>>>>>>> you
>>>>>>> never need to use gnop.
>>>>>>>
>>>>>>> I believe it's part of 10.0?
>>>>>>>
>>>>>>
>>>>>> I've not seen this yet. What we need is to port the ability to set
>>>>>> ashift at pool creation time:
>>>>>>
>>>>>> $ zpool create -o ashift=12 tank mirror disk1 disk2 mirror disk3 disk4
>>>>>>
>>>>>> I believe the Linux zfs port has this functionality now, but we still
>>>>>> do not.
>>>>>>
>>>>>
>>>>> We don't have that direct option yet but you can achieve the
>>>>> same thing by setting: vfs.zfs.min_auto_ashift=12
>>>>>
>>>>>  Does anyone have any objections to me changing this default, right
>>>> now, today?
>>>> -Nathan
>>>>
>>>
>>> I think you will get some objections to that, as it can have quite an
>>> impact
>>> on the performance for disks which are 512, due to the increased overhead
>>> of
>>> transfering 4k when only 512 is really required. This has a more dramatic
>>> impact on RAIDZx due too.
>>>
>>> Personally we run a custom kernel on our machines which has just this
>>> change
>>> in it to ensure capability with future disks, so I can confirm it does
>>> indeed
>>> have the desired effect :)
>>>
>>
>> So the discussion here is related to what to do about the installer. The
>> current ZFS component unconditionally creates gnops all over the place to
>> set ashift to 4k. That's across the board worse: it has exactly the
>> performance impact of changing the default of this sysctl (whatever that
>> is), it can't easily be overridden (which the sysctl can), and it's a
>> horrible hack to boot. There are a few options:
>>
>> 1. Change the default of vfs.zfs.min_auto_ashift
>>
> 
> This is probably a bad idea -- as others have mentioned, it can drastically
> impact space usage and performance on 512B disks, especially when using
> small ZFS blocks (e.g. for databases or VDI) and/or RAID-Z.  That said, it
> could be a reasonable default for specialized distros that are not used for
> these workloads (maybe FreeNAS or PCBSD?).
> 
> 2. Have the same effect but in a vastly worse way by adjusting the
>> installer to create gnops
>> 3. Have ZFS choose by itself and decide to do that permanently.
>>
> 
> If the device reports a 512B sector size, it would be great for ZFS to
> assume the device could be lying, and automatically determine the minimum
> ashift which gives good performance.  I think this could be done reasonably
> well for the common case by doing the following when each 512B-sector
> device is added:
> 
> 1. do random 4KB writes to the disk to determine wIOPS@4K
> 2. do random 3.5KB writes to the disk to determine wIOPS@3.5K
> 
> If wIOPS@4K > wIOPS@3.5K, assume 4KB sectors, otherwise assume 512B
> sectors.  (Note: I haven't tried this in practice; we will need to test it
> out and perhaps make some tweaks.)

Or maybe
1. do random 4KB writes that are 4KB aligned
2. do random 4KB writes that are not 4KB aligned

That would eliminate any differences due to the I/O size.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201406020532.s525Wiqn020165>