Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Sep 2014 07:24:51 -0600
From:      "Justin T. Gibbs" <gibbs@FreeBSD.org>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Xin LI <delphij@FreeBSD.org>
Subject:   Re: svn commit: r269093 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs
Message-ID:  <75BE5420-586F-4BDA-9D5C-F46A114E84EE@FreeBSD.org>
In-Reply-To: <5419648D.2030305@FreeBSD.org>
References:  <201407251841.s6PIfvUm002202@svn.freebsd.org> <5419648D.2030305@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sep 17, 2014, at 4:38 AM, Andriy Gapon <avg@FreeBSD.org> wrote:

> On 25/07/2014 21:41, Xin LI wrote:
>> Author: delphij
>> Date: Fri Jul 25 18:41:56 2014
>> New Revision: 269093
>> URL: http://svnweb.freebsd.org/changeset/base/269093
>>=20
>> Log:
>>  Transform the I/O when vdev_physical_ashift is greater than
>>  SPA_MINBLOCKSHIFT.
>=20
> This commit seems illogical to me.
> I do not see why a fact that the _physical_ sector size is greater =
than 512B
> (e.g. 4KB) should force a data buffer of a "physical" zio to be =
rounded up using
> ashift.
> To give an example.  ZFS knows that a vdev has physical sector size of =
4KB, but
> a logical sector size of 512B.  ZFS uses ashift of 12 for the vdev.
> Nevertheless in some situations ZFS wants to make a physical / special =
write of
> 512B (taking advantage of RMW).  Why should we force that write to =
become 4KB
> with data beyond the first 512B being all zeros?
>=20
> It would be logical to check for a logical sector size and to enforce =
it, but
> that is not what this commit does.
>=20
> This commit is against the spirit of commit r268855.  It was =
specifically to
> allow ZFS physical writes to have smaller alignments than otherwise =
would be
> dictated by ashift.

I too don=92t see the motivation behind this commit unless it was an =
attempt to get 4K native devices to work (i.e. devices that fail, by =
design, accesses smaller than 4K).  If this is the case, this is not the =
correct fix.  Since the transform doesn=92t do RMW, it seems like this =
can also cause data loss.
=20
>=20
> I think that this commit should be reverted.  If ZFS wants to make a =
physical
> I/O operation with a particular alignment then it must be allowed to.  =
If that
> operation is invalid for a certain vdev, then it is higher level code =
that must
> be modified to account for the logical sector size.  The physical zio =
must be
> _created_ correctly, it should not be silently transformed.

vdev_logical_ashift must be consulted when performing ZIO_FLAG_PHYSICAL =
I/O instead of assuming vdev_logical_ashift is 9.  (It is unfortunate =
that the zio flag is named this way since the I/O is *not* constrained =
by the physical characteristics of the media, but rather by the logical =
restrictions of the software running on it.)  r268855 doesn=92t do this =
because illumos doesn=92t (yet?) have this information in their vdev =
structure.  However, we=92ve had it since r254591 so the MFV in r268855 =
wasn=92t complete.  Catching the misalignment in zio.c is more =
user/developer friendly than having the CAM layer return EIO and =
convince ZFS that this is a media failure.  But, as Andriy points out, =
the upper level code that might issue these I/Os must be changed to =
understand =93logical sector size=94.

=97
Justin




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?75BE5420-586F-4BDA-9D5C-F46A114E84EE>