From owner-freebsd-fs@FreeBSD.ORG Mon Jun 2 15:02:35 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 69E5C952; Mon, 2 Jun 2014 15:02:35 +0000 (UTC) Received: from i3mail.icecube.wisc.edu (i3mail.icecube.wisc.edu [128.104.255.23]) by mx1.freebsd.org (Postfix) with ESMTP id 20F0B2022; Mon, 2 Jun 2014 15:02:34 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by i3mail.icecube.wisc.edu (Postfix) with ESMTP id 7C7F13806B; Mon, 2 Jun 2014 10:02:33 -0500 (CDT) X-Virus-Scanned: amavisd-new at icecube.wisc.edu Received: from i3mail.icecube.wisc.edu ([127.0.0.1]) by localhost (i3mail.icecube.wisc.edu [127.0.0.1]) (amavisd-new, port 10030) with ESMTP id X0mKz39HqwFI; Mon, 2 Jun 2014 10:02:33 -0500 (CDT) Received: from comporellon.tachypleus.net (polaris.tachypleus.net [75.101.50.44]) by i3mail.icecube.wisc.edu (Postfix) with ESMTPSA id D2F093806A; Mon, 2 Jun 2014 10:02:32 -0500 (CDT) Message-ID: <538C9207.9040806@freebsd.org> Date: Mon, 02 Jun 2014 08:02:31 -0700 From: Nathan Whitehorn User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Matthew Ahrens Subject: Re: fdisk(8) vs gpart(8), and gnop References: <20140601004242.GA97224@bewilderbeast.blackhelicopters.org> <3D6974D83AE9495E890D9F3CA654FA94@multiplay.co.uk> <538B4CEF.2030801@freebsd.org> <1DB2D63312CE439A96B23EAADFA9436E@multiplay.co.uk> <538B4FD7.4090000@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs , FreeBSD Hackers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jun 2014 15:02:35 -0000 On 06/01/14 14:27, Matthew Ahrens wrote: > >>> I think you will get some objections to that, as it can have quite an >>> impact >>> on the performance for disks which are 512, due to the increased overhead >>> of >>> transfering 4k when only 512 is really required. This has a more dramatic >>> impact on RAIDZx due too. >>> >>> Personally we run a custom kernel on our machines which has just this >>> change >>> in it to ensure capability with future disks, so I can confirm it does >>> indeed >>> have the desired effect :) >>> >> So the discussion here is related to what to do about the installer. The >> current ZFS component unconditionally creates gnops all over the place to >> set ashift to 4k. That's across the board worse: it has exactly the >> performance impact of changing the default of this sysctl (whatever that >> is), it can't easily be overridden (which the sysctl can), and it's a >> horrible hack to boot. There are a few options: >> >> 1. Change the default of vfs.zfs.min_auto_ashift >> > This is probably a bad idea -- as others have mentioned, it can drastically > impact space usage and performance on 512B disks, especially when using > small ZFS blocks (e.g. for databases or VDI) and/or RAID-Z. That said, it > could be a reasonable default for specialized distros that are not used for > these workloads (maybe FreeNAS or PCBSD?). > > 2. Have the same effect but in a vastly worse way by adjusting the >> installer to create gnops >> 3. Have ZFS choose by itself and decide to do that permanently. >> > If the device reports a 512B sector size, it would be great for ZFS to > assume the device could be lying, and automatically determine the minimum > ashift which gives good performance. I think this could be done reasonably > well for the common case by doing the following when each 512B-sector > device is added: > > 1. do random 4KB writes to the disk to determine wIOPS@4K > 2. do random 3.5KB writes to the disk to determine wIOPS@3.5K > > If wIOPS@4K > wIOPS@3.5K, assume 4KB sectors, otherwise assume 512B > sectors. (Note: I haven't tried this in practice; we will need to test it > out and perhaps make some tweaks.) > > I don't have the time or hardware to implement and test this, but I'd be > happy to mentor or code review. > > --matt I think we basically don't have any lying disks anymore. The ATA code does a very good job of this -- most tell the truth, but in an odd way that gets reported up the stack. ada(4) has a quirks table for the ones that do not. If this is the only concern, then we should just stop telling people to worry about this. My bigger concern is this pool upgrade one -- what if someone puts in a 4K disk in the future? -Nathan