From owner-freebsd-fs@FreeBSD.ORG Wed Dec 8 22:53:57 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8B982106566B; Wed, 8 Dec 2010 22:53:57 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 6AB298FC13; Wed, 8 Dec 2010 22:53:56 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id oB8MrqC8020995 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 9 Dec 2010 00:53:52 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id oB8MrqNc043217; Thu, 9 Dec 2010 00:53:52 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id oB8MrqC9043216; Thu, 9 Dec 2010 00:53:52 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 9 Dec 2010 00:53:52 +0200 From: Kostik Belousov To: Kirk McKusick Message-ID: <20101208225352.GK33073@deviant.kiev.zoral.com.ua> References: <20101208214909.GH33073@deviant.kiev.zoral.com.ua> <201012082223.oB8MNqGV020879@chez.mckusick.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DfnuYBTqzt7sVGu3" Content-Disposition: inline In-Reply-To: <201012082223.oB8MNqGV020879@chez.mckusick.com> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_20, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, pjd@freebsd.org, Oliver Fromme Subject: Re: TRIM support for UFS? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Dec 2010 22:53:57 -0000 --DfnuYBTqzt7sVGu3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote: > > Date: Wed, 8 Dec 2010 23:49:09 +0200 > > From: Kostik Belousov > > To: Julian Elischer > > Cc: freebsd-fs@freebsd.org, pjd@freebsd.org, > > Oliver Fromme , > > Kirk McKusick > > Subject: Re: TRIM support for UFS? > >=20 > > On Wed, Dec 08, 2010 at 09:07:56AM -0800, Julian Elischer wrote: > > > > > > From: Kirk McKusick > > > Date: Wed, 21 May 2008 13:19:18 -0700 > > > To: Poul-Henning Kamp > > > Subject: UFS and BIO_DELETE > > > X-URL: http://WWW.McKusick.COM/ > > > Reply-To: Kirk McKusick > > > > > > I enclose below my proposed patch to add BIO_DELETE to UFS > > > (goes at the end of ffs_blkfree). As I have no way to test > > > it, I am wondering if you could let me know if it works. > > > > > > Also, I am thinking of only enabling it for filesystems mounted > > > with a new flag requesting the behavior since the geteblk is a > > > rather expensive call for the usual no-op case. > > > > > > I did look at just allocating a `struct bio' as a local variable > > > and using that, but it looked like I needed to also come up with a > > > `producer' and/or `consumer' if I wanted to pass it to GEOM directly, > > > so in the end I went with this more expensive solution. If there > > > is an easy way to just pass a bio structure to GEOM, I would much > > > prefer that approach. > > > > > > ~Kirk > > > > > > > > > *** ffs_alloc.c Wed May 21 20:11:04 2008 > > > --- ffs_alloc.c.new Wed May 21 20:10:50 2008 > > > *************** > > > *** 1945,1950 **** > > > --- 1945,1962 ---- > > > ACTIVECLEAR(fs, cg); > > > UFS_UNLOCK(ump); > > > bdwrite(bp); > > > + /* > > > + * Request that the block be cleared. > > > + */ > > > + bp =3D geteblk(size); > > > + bp->b_iocmd =3D BIO_DELETE; > > > + bp->b_vp =3D devvp; > > > + bp->b_blkno =3D fsbtodb(fs, bno); > > > + bp->b_offset =3D dbtob(bp->b_blkno); > > > + bp->b_iooffset =3D bp->b_offset; > > > + bp->b_bcount =3D size; > > > + BUF_KERNPROC(bp); > > > + BO_STRATEGY(&devvp->v_bufobj, bp); > > > } > > >=20 > > > #ifdef INVARIANTS > > > > > The UFS devvp contains the pointer to struct g_consumer in > > devpp->v_bufobj->bo_private, so g_alloc_bio() and g_io_request() can be > > used to start BIO_DELETE command without fake bufer allocation. > > g_io_strategy() may be used as the sample. >=20 > Thanks for that update to moderize the patch! >=20 > > But, I have a question about the patch. The bitmap block in ffs_blkfree= () > > is handled with delayed write, by bdwrite() call. I think that it is > > quite possible for BIO_DELETE to be executed before the bitmap block > > is written and on-disk bit is set in bitmap. Would not this allow > > for the blocks to be deleted before the bitmap is updated, > > and potentially before on-disk pointers are cleared that points to > > the blocks ? SU just rolls back the unfinished dependencies on write, > > but BIO_DELETE cannot. >=20 > It is safe to do the BIO_DELETE here because at this point we have > progressed through the file delete to the point where the inode that > claimed this block has been updated on the disk with a NULL pointer. > The only step that remains is to update the on-disk bitmap to reflect > that the block is available. If the bitmap write is not done before > a system crash the only action that will be taken with respect to > the freed block is that either background fsck will restore it to > the bitmap (SU) or the journal will restore it to the bitmap (SUJ). > There is no case where either SU or SUJ will put it back into the > file once we have reached this point. Thanks for the explanation. On the other hand, can my scenario be real for async mounts ? >=20 > > Also, shouldn't the BIO_DELETE only be issued for real devices, > > and not the snapshots ? >=20 > They should also be issued for snapshots. The only time that snapshots > give up blocks is when they are deleted. The blocks that they give up > at that time are all copies of blocks that were later changed in the > filesystem. With the snapshot gone there will be no remaining references > to those blocks and they will be made available for reallocation. As > such, it would be helpful if they had been TRIMMED. >=20 > > diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c > > index b740bbb..4ff57ae 100644 > > --- a/sys/ufs/ffs/ffs_alloc.c > > +++ b/sys/ufs/ffs/ffs_alloc.c > > @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$"); > >=20 > > #include > >=20 > > +#include > > + > > #include > > #include > > #include > > @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, deph= d) > > u_int cg; > > u_int8_t *blksfree; > > struct cdev *dev; > > + struct bio *bip; > >=20 > > cg =3D dtog(fs, bno); > > if (devvp->v_type =3D=3D VREG) { > > @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dep= hd) > > softdep_setup_blkfree(UFSTOVFS(ump), bp, bno, > > numfrags(fs, size), dephd); > > bdwrite(bp); > > + > > + if (devvp->v_type !=3D VREG) { > > + bip =3D g_alloc_bio(); > > + bip->bio_cmd =3D BIO_DELETE; > > + bip->bio_offset =3D dbtob(fsbtodb(fs, bno)); > > + bip->bio_done =3D g_destroy_bio; > > + bip->bio_length =3D size; > > + g_io_request(bip, > > + (struct g_consumer *)devvp->v_bufobj.bo_private); > > + } > > } > >=20 > > #ifdef INVARIANTS >=20 > The above patch looks good though I would do it unconditionally > (e.g., also for snapshots). It seems sensible to make it conditional > on a sysctl variable so that folks could experiment with it more > easily. And I would leave it off by default as non-SSD disks are > unlikely to benefit from it. If it does prove useful for SSD disks > then I would make it conditional on a filesystem flag so that it > is possible to enable it on a filesystem-by-filesystem basis (e.g., > on for filesystems on the SSD and off for the rest). I think it is better to have a flag in superblock instead of the mount option.=20 diff --git a/sbin/dumpfs/dumpfs.c b/sbin/dumpfs/dumpfs.c index 38c05f6..fa562dc 100644 --- a/sbin/dumpfs/dumpfs.c +++ b/sbin/dumpfs/dumpfs.c @@ -253,9 +253,11 @@ dumpfs(const char *name) printf("fs_flags expanded "); if (fsflags & FS_NFS4ACLS) printf("nfsv4acls "); + if (fsflags & FS_TRIM) + printf("trim "); fsflags &=3D ~(FS_UNCLEAN | FS_DOSOFTDEP | FS_NEEDSFSCK | FS_INDEXDIRS | FS_ACLS | FS_MULTILABEL | FS_GJOURNAL | FS_FLAGS_UPDATED | - FS_NFS4ACLS | FS_SUJ); + FS_NFS4ACLS | FS_SUJ | FS_TRIM); if (fsflags !=3D 0) printf("unknown flags (%#x)", fsflags); putchar('\n'); diff --git a/sbin/tunefs/tunefs.c b/sbin/tunefs/tunefs.c index b2ea602..0ed713d 100644 --- a/sbin/tunefs/tunefs.c +++ b/sbin/tunefs/tunefs.c @@ -82,11 +82,13 @@ int main(int argc, char *argv[]) { char *avalue, *jvalue, *Jvalue, *Lvalue, *lvalue, *Nvalue, *nvalue; + char *tvalue; const char *special, *on; const char *name; int active; int Aflag, aflag, eflag, evalue, fflag, fvalue, jflag, Jflag, Lflag; int lflag, mflag, mvalue, Nflag, nflag, oflag, ovalue, pflag, sflag; + int tflag; int svalue, Sflag, Svalue; int ch, found_arg, i; const char *chg[2]; @@ -101,7 +103,8 @@ main(int argc, char *argv[]) evalue =3D fvalue =3D mvalue =3D ovalue =3D svalue =3D Svalue =3D 0; active =3D 0; found_arg =3D 0; /* At least one arg is required. */ - while ((ch =3D getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:")) !=3D -= 1) + while ((ch =3D getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:t:")) + !=3D -1) switch (ch) { =20 case 'A': @@ -268,6 +271,18 @@ main(int argc, char *argv[]) Sflag =3D 1; break; =20 + case 't': + found_arg =3D 1; + name =3D "trim"; + tvalue =3D optarg; + if (strcmp(tvalue, "enable") !=3D 0 && + strcmp(tvalue, "disable") !=3D 0) { + errx(10, "bad %s (options are %s)", + name, "`enable' or `disable'"); + } + tflag =3D 1; + break; + default: usage(); } @@ -493,6 +508,24 @@ main(int argc, char *argv[]) sblock.fs_avgfpdir =3D svalue; } } + if (tflag) { + name =3D "issue TRIM to the disk"; + if (strcmp(tvalue, "enable") =3D=3D 0) { + if (sblock.fs_flags & FS_TRIM) + warnx("%s remains unchanged as enabled", name); + else { + sblock.fs_flags |=3D FS_TRIM; + warnx("%s set", name); + } + } else if (strcmp(tvalue, "disable") =3D=3D 0) { + if ((~sblock.fs_flags & FS_TRIM) =3D=3D FS_TRIM) + warnx("%s remains unchanged as disabled", name); + else { + sblock.fs_flags &=3D ~FS_TRIM; + warnx("%s cleared", name); + } + } + } =20 if (sbwrite(&disk, Aflag) =3D=3D -1) goto err; @@ -1011,12 +1044,13 @@ out: void usage(void) { - fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n", + fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n%s\n", "usage: tunefs [-A] [-a enable | disable] [-e maxbpg] [-f avgfilesize]", " [-J enable | disable] [-j enable | disable]",=20 " [-L volname] [-l enable | disable] [-m minfree]", " [-N enable | disable] [-n enable | disable]", -" [-o space | time] [-p] [-s avgfpdir] special | filesystem"); +" [-o space | time] [-p] [-s avgfpdir] [-t enable | disable]", +" special | filesystem"); exit(2); } =20 @@ -1035,6 +1069,8 @@ printfs(void) (sblock.fs_flags & FS_SUJ)? "enabled" : "disabled"); warnx("gjournal: (-J) %s", (sblock.fs_flags & FS_GJOURNAL)? "enabled" : "disabled"); + warnx("trim: (-p) %s",=20 + (sblock.fs_flags & FS_TRIM)? "enabled" : "disabled"); warnx("maximum blocks per file in a cylinder group: (-e) %d", sblock.fs_maxbpg); warnx("average file size: (-f) %d", diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c index b740bbb..ed39aa3 100644 --- a/sys/ufs/ffs/ffs_alloc.c +++ b/sys/ufs/ffs/ffs_alloc.c @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$"); =20 #include =20 +#include + #include #include #include @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) u_int cg; u_int8_t *blksfree; struct cdev *dev; + struct bio *bip; =20 cg =3D dtog(fs, bno); if (devvp->v_type =3D=3D VREG) { @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) softdep_setup_blkfree(UFSTOVFS(ump), bp, bno, numfrags(fs, size), dephd); bdwrite(bp); + + if (fs->fs_flags & FS_TRIM) { + bip =3D g_alloc_bio(); + bip->bio_cmd =3D BIO_DELETE; + bip->bio_offset =3D dbtob(fsbtodb(fs, bno)); + bip->bio_done =3D g_destroy_bio; + bip->bio_length =3D size; + g_io_request(bip, + (struct g_consumer *)devvp->v_bufobj.bo_private); + } } =20 #ifdef INVARIANTS diff --git a/sys/ufs/ffs/fs.h b/sys/ufs/ffs/fs.h index ba98ed3..13d9ede 100644 --- a/sys/ufs/ffs/fs.h +++ b/sys/ufs/ffs/fs.h @@ -417,6 +417,7 @@ CTASSERT(sizeof(struct fs) =3D=3D 1376); #define FS_FLAGS_UPDATED 0x0080 /* flags have been moved to new location */ #define FS_NFS4ACLS 0x0100 /* file system has NFSv4 ACLs enabled */ #define FS_INDEXDIRS 0x0200 /* kernel supports indexed directories */ +#define FS_TRIM 0x0400 /* issue BIO_DELETE for deleted blocks */ =20 /* * Macros to access bits in the fs_active array. --DfnuYBTqzt7sVGu3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk0ADH8ACgkQC3+MBN1Mb4hV0gCglghMNOcFLRx0OWWGMIFNtzZU g18AoJEJFYVO8ggUEbTTLLNHbrb4bfyF =NflN -----END PGP SIGNATURE----- --DfnuYBTqzt7sVGu3--