Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Sep 2012 02:04:39 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Andriy Gapon" <avg@FreeBSD.org>, <freebsd-fs@FreeBSD.org>
Cc:        freebsd-geom@FreeBSD.org
Subject:   Re: zfs zvol: set geom mediasize right at creation time
Message-ID:  <80F518854AE34A759D9441AE1A60D2DC@multiplay.co.uk>
References:  <505DF1A3.1020809@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.

------=_NextPart_000_02EE_01CD992F.CA115AA0
Content-Type: text/plain; format=flowed; charset="Windows-1252";
	reply-type=original
Content-Transfer-Encoding: 7bit

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>

> Please review the following patch.
> 
> In addition to what the description says I almost by accident sneaked another
> change into the patch.  It's setting of stripesize to volblocksize.  I think
> that the change should make sense, but it is really a different change.
> 
> 
> A side note: setting sectorsize to volblocksize seemed like an overkill and it
> would certainly mess the existing zvols in use.  Maybe there should be another
> property like reportedblocksize or something.
> 
> commit 1585e6cfb602c2a2647b9f802445bb174bc430a4
> Author: Andriy Gapon <avg@icyb.net.ua>
> Date:   Wed Sep 19 20:49:28 2012 +0300
> 
>    zvol: set mediasize in geom provider right upon its creation
> 
>    ... instead of deferring the action until first open.
>    Unlike upstream this has no benefit on FreeBSD.
>    We know that as soon as the provider is created it is going to be tasted
>    and thus opened.  Initial mediasize of zero causes tasting failure
>    and subsequent retasting because of the size change.
> 
> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
> index d47d270..6e9e7a3 100644
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
> @@ -475,6 +475,7 @@ zvol_create_minor(const char *name)
>  zvol_state_t *zv;
>  objset_t *os;
>  dmu_object_info_t doi;
> + uint64_t volblocksize, volsize;
>  int error;
> 
>  ZFS_LOG(1, "Creating ZVOL %s...", name);
> @@ -535,9 +536,20 @@ zvol_create_minor(const char *name)
>  zv = zs->zss_data = kmem_zalloc(sizeof (zvol_state_t), KM_SLEEP);
> #else /* !sun */
> 
> + error = zap_lookup(os, ZVOL_ZAP_OBJ, "size", 8, 1, &volsize);
> + if (error) {
> + ASSERT(error == 0);
> + dmu_objset_disown(os, zvol_tag);
> + mutex_exit(&spa_namespace_lock);
> + return (error);
> + }
> +
>  DROP_GIANT();
>  g_topology_lock();
>  zv = zvol_geom_create(name);
> + zv->zv_volsize = volsize;
> + zv->zv_provider->mediasize = zv->zv_volsize;
> +
> #endif /* !sun */
> 
>  (void) strlcpy(zv->zv_name, name, MAXPATHLEN);
> @@ -554,6 +566,7 @@ zvol_create_minor(const char *name)
>  error = dmu_object_info(os, ZVOL_OBJ, &doi);
>  ASSERT(error == 0);
>  zv->zv_volblocksize = doi.doi_data_block_size;
> + zv->zv_provider->stripesize = zv->zv_volblocksize;
> 
>  if (spa_writeable(dmu_objset_spa(os))) {
>  if (zil_replay_disable)

Do you know what the effect of the volblocksize change
will have on a volume who's disk block size changes?
e.g. via a quirk for a 4k disk being added

I ask as we've testing a patch here which changes ashift to
be based on stripesize instead of sectorsize but in its
current form it has some odd side effects on pools which
are boot pools.

Said patch is attached for reference.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.
------=_NextPart_000_02EE_01CD992F.CA115AA0
Content-Type: text/plain; format=flowed; name="zfs-ashift-fix.txt";
	reply-type=original
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="zfs-ashift-fix.txt"

Changes zfs zpool initial / desired ashift to be based off stripesize=0A=
instead of sectorsize making it compatible with drives marked with=0A=
the 4k sector size quirk.=0A=
=0A=
Without the correct min block size BIO_DELETE requests passed to=0A=
a large number of current SSD's via TRIM don't actually perform=0A=
any LBA TRIM so its vital for the correct operation of TRIM to get=0A=
the correct min block size.=0A=
=0A=
To do this we added the additional dashift (desired ashift) to=0A=
vdev_open_func_t calls. This was needed as just updating ashift to=0A=
be based off stripesize would mean that a devices reported minimum=0A=
transfer size (ashift) could increase and that in turn would cause=0A=
member devices to be unusable and hence break pools with error=0A=
ZFS-8000-5E.=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h.orig	=
2012-07-03 11:48:22.353483879 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h	=
2012-07-03 11:59:17.195442033 +0000=0A=
@@ -55,7 +55,7 @@=0A=
 /*=0A=
  * Virtual device operations=0A=
  */=0A=
-typedef int	vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t =
*ashift);=0A=
+typedef int	vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t =
*ashift, uint64_t *dashift);=0A=
 typedef void	vdev_close_func_t(vdev_t *vd);=0A=
 typedef uint64_t vdev_asize_func_t(vdev_t *vd, uint64_t psize);=0A=
 typedef int	vdev_io_start_func_t(zio_t *zio);=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c.orig	=
2012-07-03 12:53:37.716867380 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c	2012-07-03 =
12:01:58.522455031 +0000=0A=
@@ -1122,6 +1122,7 @@=0A=
 	uint64_t osize =3D 0;=0A=
 	uint64_t asize, psize;=0A=
 	uint64_t ashift =3D 0;=0A=
+	uint64_t dashift =3D 0;=0A=
 =0A=
 	ASSERT(vd->vdev_open_thread =3D=3D curthread ||=0A=
 	    spa_config_held(spa, SCL_STATE_ALL, RW_WRITER) =3D=3D =
SCL_STATE_ALL);=0A=
@@ -1151,7 +1152,7 @@=0A=
 		return (ENXIO);=0A=
 	}=0A=
 =0A=
-	error =3D vd->vdev_ops->vdev_op_open(vd, &osize, &ashift);=0A=
+	error =3D vd->vdev_ops->vdev_op_open(vd, &osize, &ashift, &dashift);=0A=
 =0A=
 	/*=0A=
 	 * Reset the vdev_reopening flag so that we actually close=0A=
@@ -1251,7 +1252,7 @@=0A=
 		 * For testing purposes, a higher ashift can be requested.=0A=
 		 */=0A=
 		vd->vdev_asize =3D asize;=0A=
-		vd->vdev_ashift =3D MAX(ashift, vd->vdev_ashift);=0A=
+		vd->vdev_ashift =3D MAX(MAX(ashift, dashift), vd->vdev_ashift);=0A=
 	} else {=0A=
 		/*=0A=
 		 * Make sure the alignment requirement hasn't increased.=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c.orig	=
2012-07-03 11:49:34.103219588 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c	=
2012-07-03 12:51:44.521525471 +0000=0A=
@@ -103,7 +103,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A=
+vdev_disk_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	spa_t *spa =3D vd->vdev_spa;=0A=
 	vdev_disk_t *dvd;=0A=
@@ -284,7 +284,7 @@=0A=
 	}=0A=
 =0A=
 	/*=0A=
-	 * Determine the device's minimum transfer size.=0A=
+	 * Determine the device's minimum and desired transfer size.=0A=
 	 * If the ioctl isn't supported, assume DEV_BSIZE.=0A=
 	 */=0A=
 	if (ldi_ioctl(dvd->vd_lh, DKIOCGMEDIAINFOEXT, (intptr_t)&dkmext,=0A=
@@ -292,6 +292,7 @@=0A=
 		dkmext.dki_pbsize =3D DEV_BSIZE;=0A=
 =0A=
 	*ashift =3D highbit(MAX(dkmext.dki_pbsize, SPA_MINBLOCKSIZE)) - 1;=0A=
+	*dashift =3D *ashift;=0A=
 =0A=
 	/*=0A=
 	 * Clear the nowritecache bit, so that on a vdev_reopen() we will=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c.orig	=
2012-07-03 11:48:42.314740333 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c	=
2012-07-03 11:57:22.579381320 +0000=0A=
@@ -47,7 +47,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A=
+vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	vdev_file_t *vf;=0A=
 	vnode_t *vp;=0A=
@@ -127,6 +127,7 @@=0A=
 =0A=
 	*psize =3D vattr.va_size;=0A=
 	*ashift =3D SPA_MINBLOCKSHIFT;=0A=
+	*dashift =3D SPA_MINBLOCKSHIFT;=0A=
 =0A=
 	return (0);=0A=
 }=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c.orig	=
2012-07-03 12:50:50.158161825 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	=
2012-07-03 12:52:01.408085155 +0000=0A=
@@ -416,7 +416,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A=
+vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	struct g_provider *pp;=0A=
 	struct g_consumer *cp;=0A=
@@ -502,9 +502,10 @@=0A=
 	*psize =3D pp->mediasize;=0A=
 =0A=
 	/*=0A=
-	 * Determine the device's minimum transfer size.=0A=
+	 * Determine the device's minimum and desired transfer size.=0A=
 	 */=0A=
 	*ashift =3D highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1;=0A=
+	*dashift =3D highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;=0A=
 =0A=
 	/*=0A=
 	 * Clear the nowritecache settings, so that on a vdev_reopen()=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c.orig	=
2012-07-03 11:49:22.342245151 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c	=
2012-07-03 11:58:02.161948585 +0000=0A=
@@ -127,7 +127,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_mirror_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift)=0A=
+vdev_mirror_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift, =
uint64_t *dashift)=0A=
 {=0A=
 	int numerrors =3D 0;=0A=
 	int lasterror =3D 0;=0A=
@@ -150,6 +150,7 @@=0A=
 =0A=
 		*asize =3D MIN(*asize - 1, cvd->vdev_asize - 1) + 1;=0A=
 		*ashift =3D MAX(*ashift, cvd->vdev_ashift);=0A=
+		*dashift =3D *ashift;=0A=
 	}=0A=
 =0A=
 	if (numerrors =3D=3D vd->vdev_children) {=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c.orig	=
2012-07-03 11:49:10.545275865 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c	=
2012-07-03 11:58:07.670470640 +0000=0A=
@@ -40,7 +40,7 @@=0A=
 =0A=
 /* ARGSUSED */=0A=
 static int=0A=
-vdev_missing_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)=0A=
+vdev_missing_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift, =
uint64_t *dashift)=0A=
 {=0A=
 	/*=0A=
 	 * Really this should just fail.  But then the root vdev will be in the=0A=
@@ -50,6 +50,7 @@=0A=
 	 */=0A=
 	*psize =3D 0;=0A=
 	*ashift =3D 0;=0A=
+	*dashift =3D 0;=0A=
 	return (0);=0A=
 }=0A=
 =0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c.orig	=
2012-07-03 11:49:03.675875505 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c	=
2012-07-03 11:58:15.334806334 +0000=0A=
@@ -1447,7 +1447,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_raidz_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift)=0A=
+vdev_raidz_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	vdev_t *cvd;=0A=
 	uint64_t nparity =3D vd->vdev_nparity;=0A=
@@ -1476,6 +1476,7 @@=0A=
 =0A=
 		*asize =3D MIN(*asize - 1, cvd->vdev_asize - 1) + 1;=0A=
 		*ashift =3D MAX(*ashift, cvd->vdev_ashift);=0A=
+		*dashift =3D *ashift;=0A=
 	}=0A=
 =0A=
 	*asize *=3D vd->vdev_children;=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c.orig	=
2012-07-03 11:49:27.901760380 +0000=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c	=
2012-07-03 11:58:19.704427068 +0000=0A=
@@ -50,7 +50,7 @@=0A=
 }=0A=
 =0A=
 static int=0A=
-vdev_root_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift)=0A=
+vdev_root_open(vdev_t *vd, uint64_t *asize, uint64_t *ashift, uint64_t =
*dashift)=0A=
 {=0A=
 	int lasterror =3D 0;=0A=
 	int numerrors =3D 0;=0A=
@@ -78,6 +78,7 @@=0A=
 =0A=
 	*asize =3D 0;=0A=
 	*ashift =3D 0;=0A=
+	*dashift =3D 0;=0A=
 =0A=
 	return (0);=0A=
 }=0A=

------=_NextPart_000_02EE_01CD992F.CA115AA0--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?80F518854AE34A759D9441AE1A60D2DC>