From owner-freebsd-hackers@FreeBSD.ORG Sun Jun 1 21:27:44 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 179F9CDB for ; Sun, 1 Jun 2014 21:27:44 +0000 (UTC) Received: from mail-pd0-x22d.google.com (mail-pd0-x22d.google.com [IPv6:2607:f8b0:400e:c02::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D74F0222D for ; Sun, 1 Jun 2014 21:27:43 +0000 (UTC) Received: by mail-pd0-f173.google.com with SMTP id v10so2736907pde.4 for ; Sun, 01 Jun 2014 14:27:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphix.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ots8PHipsPZ5ortSy5BicTIvA2Bo0qt+PWJ6Ph/ErKo=; b=UuXLaD9jgO08sqafTD6xPIb2XbNdn253cxcqflb8Z8JCRVX6Yz+kLyT4BIy3DyhjhB UqDZ8AtcRlt4SFZAQmtybzH6+vi+ACHCRuzF/ys0xlKv0a4kSA1ZO5GT1umPmC/chX5x wlXH0WtY08Ee6dy+LaHcLovO99FwvRIzvzH9k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=ots8PHipsPZ5ortSy5BicTIvA2Bo0qt+PWJ6Ph/ErKo=; b=R9Pp3WRien/nj55fdyH15jw8b7wC3f1AhUWf3hEd1tjPU7aXkQD+Z2euWomqmvmN8W 3Ab19Uc5Yp4wHgzSQoJSrAqbrwr1rM+S7n4/3pWlfaY8dvEZ6RFYx3NWegC2/wmvfzhY DbbS1oM9DloP/UK/vB9mbDrPRvqezWucL6uETlohw4oD0huaAuKZYtLY2egIfaJaS1P6 cBMr096KjrS7xPwTS33oHols1vUIjZjkgLGm/odUNYBxezY7x+Qn5qRVlqAhTMeETfr6 5wb9642ZqBov/4kqW//EuCysNyTc6FKj0yli4geqhgya0ujZnc5BNWT+4nWFY/YSK/H9 jjPA== X-Gm-Message-State: ALoCoQkRReqxcDQpl5fVTYHQeF1eppHGmW91BtnG/kD1nRZYKEBF5T4Zb85RAoBYfWgBNYn82qAW MIME-Version: 1.0 X-Received: by 10.68.133.7 with SMTP id oy7mr35897783pbb.43.1401658062863; Sun, 01 Jun 2014 14:27:42 -0700 (PDT) Received: by 10.70.0.202 with HTTP; Sun, 1 Jun 2014 14:27:42 -0700 (PDT) In-Reply-To: <538B4FD7.4090000@freebsd.org> References: <20140601004242.GA97224@bewilderbeast.blackhelicopters.org> <3D6974D83AE9495E890D9F3CA654FA94@multiplay.co.uk> <538B4CEF.2030801@freebsd.org> <1DB2D63312CE439A96B23EAADFA9436E@multiplay.co.uk> <538B4FD7.4090000@freebsd.org> Date: Sun, 1 Jun 2014 14:27:42 -0700 Message-ID: Subject: Re: fdisk(8) vs gpart(8), and gnop From: Matthew Ahrens To: Nathan Whitehorn X-Mailman-Approved-At: Sun, 01 Jun 2014 23:46:54 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18 Cc: freebsd-fs , FreeBSD Hackers , Steven Hartland X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Jun 2014 21:27:44 -0000 On Sun, Jun 1, 2014 at 9:07 AM, Nathan Whitehorn wrote: > On 06/01/14 09:00, Steven Hartland wrote: > >> >> ----- Original Message ----- From: "Nathan Whitehorn" < >> nwhitehorn@freebsd.org> >> To: ; >> Sent: Sunday, June 01, 2014 4:55 PM >> Subject: Re: fdisk(8) vs gpart(8), and gnop >> >> >> On 06/01/14 08:52, Steven Hartland wrote: >>> >>>> ----- Original Message ----- From: "Mark Felder" >>>> >>>> On May 31, 2014, at 20:57, Freddie Cash wrote: >>>>> >>>>> There's a sysctl where you can set the minimum ashift for zfs. Then >>>>>> you >>>>>> never need to use gnop. >>>>>> >>>>>> I believe it's part of 10.0? >>>>>> >>>>> >>>>> I've not seen this yet. What we need is to port the ability to set >>>>> ashift at pool creation time: >>>>> >>>>> $ zpool create -o ashift=12 tank mirror disk1 disk2 mirror disk3 disk4 >>>>> >>>>> I believe the Linux zfs port has this functionality now, but we still >>>>> do not. >>>>> >>>> >>>> We don't have that direct option yet but you can achieve the >>>> same thing by setting: vfs.zfs.min_auto_ashift=12 >>>> >>>> Does anyone have any objections to me changing this default, right >>> now, today? >>> -Nathan >>> >> >> I think you will get some objections to that, as it can have quite an >> impact >> on the performance for disks which are 512, due to the increased overhead >> of >> transfering 4k when only 512 is really required. This has a more dramatic >> impact on RAIDZx due too. >> >> Personally we run a custom kernel on our machines which has just this >> change >> in it to ensure capability with future disks, so I can confirm it does >> indeed >> have the desired effect :) >> > > So the discussion here is related to what to do about the installer. The > current ZFS component unconditionally creates gnops all over the place to > set ashift to 4k. That's across the board worse: it has exactly the > performance impact of changing the default of this sysctl (whatever that > is), it can't easily be overridden (which the sysctl can), and it's a > horrible hack to boot. There are a few options: > > 1. Change the default of vfs.zfs.min_auto_ashift > This is probably a bad idea -- as others have mentioned, it can drastically impact space usage and performance on 512B disks, especially when using small ZFS blocks (e.g. for databases or VDI) and/or RAID-Z. That said, it could be a reasonable default for specialized distros that are not used for these workloads (maybe FreeNAS or PCBSD?). 2. Have the same effect but in a vastly worse way by adjusting the > installer to create gnops > 3. Have ZFS choose by itself and decide to do that permanently. > If the device reports a 512B sector size, it would be great for ZFS to assume the device could be lying, and automatically determine the minimum ashift which gives good performance. I think this could be done reasonably well for the common case by doing the following when each 512B-sector device is added: 1. do random 4KB writes to the disk to determine wIOPS@4K 2. do random 3.5KB writes to the disk to determine wIOPS@3.5K If wIOPS@4K > wIOPS@3.5K, assume 4KB sectors, otherwise assume 512B sectors. (Note: I haven't tried this in practice; we will need to test it out and perhaps make some tweaks.) I don't have the time or hardware to implement and test this, but I'd be happy to mentor or code review. --matt > > Our ATA code is good about reporting block sizes now, so (3) isn't a big > issue except for the mixed-pool case, which is a huge PITA. > > We need to choose one of these. I favor (1). > -Nathan > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >