Date: Sun, 1 Jun 2014 14:27:42 -0700 From: Matthew Ahrens <mahrens@delphix.com> To: Nathan Whitehorn <nwhitehorn@freebsd.org> Cc: freebsd-fs <freebsd-fs@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, Steven Hartland <killing@multiplay.co.uk> Subject: Re: fdisk(8) vs gpart(8), and gnop Message-ID: <CAJjvXiFAX7N-30g0OZ6idqLnyJww5dsyhGfLj6nYwKs9Xp--1g@mail.gmail.com> In-Reply-To: <538B4FD7.4090000@freebsd.org> References: <20140601004242.GA97224@bewilderbeast.blackhelicopters.org> <CAOjFWZ5N9FGwgSz0_YFNQjavzdJDitRn52VKn4ipW1ddj6-weQ@mail.gmail.com> <BCA9F5D6-3925-4E7E-9082-128652508305@FreeBSD.org> <3D6974D83AE9495E890D9F3CA654FA94@multiplay.co.uk> <538B4CEF.2030801@freebsd.org> <1DB2D63312CE439A96B23EAADFA9436E@multiplay.co.uk> <538B4FD7.4090000@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jun 1, 2014 at 9:07 AM, Nathan Whitehorn <nwhitehorn@freebsd.org> wrote: > On 06/01/14 09:00, Steven Hartland wrote: > >> >> ----- Original Message ----- From: "Nathan Whitehorn" < >> nwhitehorn@freebsd.org> >> To: <freebsd-hackers@freebsd.org>; <freebsd-fs@freebsd.org> >> Sent: Sunday, June 01, 2014 4:55 PM >> Subject: Re: fdisk(8) vs gpart(8), and gnop >> >> >> On 06/01/14 08:52, Steven Hartland wrote: >>> >>>> ----- Original Message ----- From: "Mark Felder" <feld@freebsd.org> >>>> >>>> On May 31, 2014, at 20:57, Freddie Cash <fjwcash@gmail.com> wrote: >>>>> >>>>> There's a sysctl where you can set the minimum ashift for zfs. Then >>>>>> you >>>>>> never need to use gnop. >>>>>> >>>>>> I believe it's part of 10.0? >>>>>> >>>>> >>>>> I've not seen this yet. What we need is to port the ability to set >>>>> ashift at pool creation time: >>>>> >>>>> $ zpool create -o ashift=12 tank mirror disk1 disk2 mirror disk3 disk4 >>>>> >>>>> I believe the Linux zfs port has this functionality now, but we still >>>>> do not. >>>>> >>>> >>>> We don't have that direct option yet but you can achieve the >>>> same thing by setting: vfs.zfs.min_auto_ashift=12 >>>> >>>> Does anyone have any objections to me changing this default, right >>> now, today? >>> -Nathan >>> >> >> I think you will get some objections to that, as it can have quite an >> impact >> on the performance for disks which are 512, due to the increased overhead >> of >> transfering 4k when only 512 is really required. This has a more dramatic >> impact on RAIDZx due too. >> >> Personally we run a custom kernel on our machines which has just this >> change >> in it to ensure capability with future disks, so I can confirm it does >> indeed >> have the desired effect :) >> > > So the discussion here is related to what to do about the installer. The > current ZFS component unconditionally creates gnops all over the place to > set ashift to 4k. That's across the board worse: it has exactly the > performance impact of changing the default of this sysctl (whatever that > is), it can't easily be overridden (which the sysctl can), and it's a > horrible hack to boot. There are a few options: > > 1. Change the default of vfs.zfs.min_auto_ashift > This is probably a bad idea -- as others have mentioned, it can drastically impact space usage and performance on 512B disks, especially when using small ZFS blocks (e.g. for databases or VDI) and/or RAID-Z. That said, it could be a reasonable default for specialized distros that are not used for these workloads (maybe FreeNAS or PCBSD?). 2. Have the same effect but in a vastly worse way by adjusting the > installer to create gnops > 3. Have ZFS choose by itself and decide to do that permanently. > If the device reports a 512B sector size, it would be great for ZFS to assume the device could be lying, and automatically determine the minimum ashift which gives good performance. I think this could be done reasonably well for the common case by doing the following when each 512B-sector device is added: 1. do random 4KB writes to the disk to determine wIOPS@4K 2. do random 3.5KB writes to the disk to determine wIOPS@3.5K If wIOPS@4K > wIOPS@3.5K, assume 4KB sectors, otherwise assume 512B sectors. (Note: I haven't tried this in practice; we will need to test it out and perhaps make some tweaks.) I don't have the time or hardware to implement and test this, but I'd be happy to mentor or code review. --matt > > Our ATA code is good about reporting block sizes now, so (3) isn't a big > issue except for the mixed-pool case, which is a huge PITA. > > We need to choose one of these. I favor (1). > -Nathan > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJjvXiFAX7N-30g0OZ6idqLnyJww5dsyhGfLj6nYwKs9Xp--1g>