From owner-svn-src-head@FreeBSD.ORG Wed Sep 17 13:25:00 2014 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 032E1462; Wed, 17 Sep 2014 13:25:00 +0000 (UTC) Received: from aslan.scsiguy.com (aslan.scsiguy.com [70.89.174.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CEBEED56; Wed, 17 Sep 2014 13:24:59 +0000 (UTC) Received: from [192.168.0.61] (jt-mbp.home.scsiguy.org [192.168.0.61]) (authenticated bits=0) by aslan.scsiguy.com (8.14.9/8.14.9) with ESMTP id s8HDOpN9038799 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 17 Sep 2014 07:24:52 -0600 (MDT) (envelope-from gibbs@FreeBSD.org) Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: svn commit: r269093 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs From: "Justin T. Gibbs" In-Reply-To: <5419648D.2030305@FreeBSD.org> Date: Wed, 17 Sep 2014 07:24:51 -0600 Message-Id: <75BE5420-586F-4BDA-9D5C-F46A114E84EE@FreeBSD.org> References: <201407251841.s6PIfvUm002202@svn.freebsd.org> <5419648D.2030305@FreeBSD.org> To: Andriy Gapon X-Mailer: Apple Mail (2.1878.6) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Xin LI X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Sep 2014 13:25:00 -0000 On Sep 17, 2014, at 4:38 AM, Andriy Gapon wrote: > On 25/07/2014 21:41, Xin LI wrote: >> Author: delphij >> Date: Fri Jul 25 18:41:56 2014 >> New Revision: 269093 >> URL: http://svnweb.freebsd.org/changeset/base/269093 >>=20 >> Log: >> Transform the I/O when vdev_physical_ashift is greater than >> SPA_MINBLOCKSHIFT. >=20 > This commit seems illogical to me. > I do not see why a fact that the _physical_ sector size is greater = than 512B > (e.g. 4KB) should force a data buffer of a "physical" zio to be = rounded up using > ashift. > To give an example. ZFS knows that a vdev has physical sector size of = 4KB, but > a logical sector size of 512B. ZFS uses ashift of 12 for the vdev. > Nevertheless in some situations ZFS wants to make a physical / special = write of > 512B (taking advantage of RMW). Why should we force that write to = become 4KB > with data beyond the first 512B being all zeros? >=20 > It would be logical to check for a logical sector size and to enforce = it, but > that is not what this commit does. >=20 > This commit is against the spirit of commit r268855. It was = specifically to > allow ZFS physical writes to have smaller alignments than otherwise = would be > dictated by ashift. I too don=92t see the motivation behind this commit unless it was an = attempt to get 4K native devices to work (i.e. devices that fail, by = design, accesses smaller than 4K). If this is the case, this is not the = correct fix. Since the transform doesn=92t do RMW, it seems like this = can also cause data loss. =20 >=20 > I think that this commit should be reverted. If ZFS wants to make a = physical > I/O operation with a particular alignment then it must be allowed to. = If that > operation is invalid for a certain vdev, then it is higher level code = that must > be modified to account for the logical sector size. The physical zio = must be > _created_ correctly, it should not be silently transformed. vdev_logical_ashift must be consulted when performing ZIO_FLAG_PHYSICAL = I/O instead of assuming vdev_logical_ashift is 9. (It is unfortunate = that the zio flag is named this way since the I/O is *not* constrained = by the physical characteristics of the media, but rather by the logical = restrictions of the software running on it.) r268855 doesn=92t do this = because illumos doesn=92t (yet?) have this information in their vdev = structure. However, we=92ve had it since r254591 so the MFV in r268855 = wasn=92t complete. Catching the misalignment in zio.c is more = user/developer friendly than having the CAM layer return EIO and = convince ZFS that this is a media failure. But, as Andriy points out, = the upper level code that might issue these I/Os must be changed to = understand =93logical sector size=94. =97 Justin