Date: Wed, 08 Dec 2010 14:23:52 -0800 From: Kirk McKusick <mckusick@mckusick.com> To: Kostik Belousov <kostikbel@gmail.com> Cc: freebsd-fs@freebsd.org, pjd@freebsd.org, Oliver Fromme <olli@lurza.secnetix.de> Subject: Re: TRIM support for UFS? Message-ID: <201012082223.oB8MNqGV020879@chez.mckusick.com> In-Reply-To: <20101208214909.GH33073@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Wed, 8 Dec 2010 23:49:09 +0200 > From: Kostik Belousov <kostikbel@gmail.com> > To: Julian Elischer <julian@freebsd.org> > Cc: freebsd-fs@freebsd.org, pjd@freebsd.org, > Oliver Fromme <olli@lurza.secnetix.de>, > Kirk McKusick <mckusick@mckusick.com> > Subject: Re: TRIM support for UFS? > > On Wed, Dec 08, 2010 at 09:07:56AM -0800, Julian Elischer wrote: > > > > From: Kirk McKusick <mckusick@chez.mckusick.com> > > Date: Wed, 21 May 2008 13:19:18 -0700 > > To: Poul-Henning Kamp <phk@critter.freebsd.dk> > > Subject: UFS and BIO_DELETE > > X-URL: http://WWW.McKusick.COM/ > > Reply-To: Kirk McKusick <mckusick@McKusick.COM> > > > > I enclose below my proposed patch to add BIO_DELETE to UFS > > (goes at the end of ffs_blkfree). As I have no way to test > > it, I am wondering if you could let me know if it works. > > > > Also, I am thinking of only enabling it for filesystems mounted > > with a new flag requesting the behavior since the geteblk is a > > rather expensive call for the usual no-op case. > > > > I did look at just allocating a `struct bio' as a local variable > > and using that, but it looked like I needed to also come up with a > > `producer' and/or `consumer' if I wanted to pass it to GEOM directly, > > so in the end I went with this more expensive solution. If there > > is an easy way to just pass a bio structure to GEOM, I would much > > prefer that approach. > > > > ~Kirk > > > > > > *** ffs_alloc.c Wed May 21 20:11:04 2008 > > --- ffs_alloc.c.new Wed May 21 20:10:50 2008 > > *************** > > *** 1945,1950 **** > > --- 1945,1962 ---- > > ACTIVECLEAR(fs, cg); > > UFS_UNLOCK(ump); > > bdwrite(bp); > > + /* > > + * Request that the block be cleared. > > + */ > > + bp = geteblk(size); > > + bp->b_iocmd = BIO_DELETE; > > + bp->b_vp = devvp; > > + bp->b_blkno = fsbtodb(fs, bno); > > + bp->b_offset = dbtob(bp->b_blkno); > > + bp->b_iooffset = bp->b_offset; > > + bp->b_bcount = size; > > + BUF_KERNPROC(bp); > > + BO_STRATEGY(&devvp->v_bufobj, bp); > > } > > > > #ifdef INVARIANTS > > > The UFS devvp contains the pointer to struct g_consumer in > devpp->v_bufobj->bo_private, so g_alloc_bio() and g_io_request() can be > used to start BIO_DELETE command without fake bufer allocation. > g_io_strategy() may be used as the sample. Thanks for that update to moderize the patch! > But, I have a question about the patch. The bitmap block in ffs_blkfree() > is handled with delayed write, by bdwrite() call. I think that it is > quite possible for BIO_DELETE to be executed before the bitmap block > is written and on-disk bit is set in bitmap. Would not this allow > for the blocks to be deleted before the bitmap is updated, > and potentially before on-disk pointers are cleared that points to > the blocks ? SU just rolls back the unfinished dependencies on write, > but BIO_DELETE cannot. It is safe to do the BIO_DELETE here because at this point we have progressed through the file delete to the point where the inode that claimed this block has been updated on the disk with a NULL pointer. The only step that remains is to update the on-disk bitmap to reflect that the block is available. If the bitmap write is not done before a system crash the only action that will be taken with respect to the freed block is that either background fsck will restore it to the bitmap (SU) or the journal will restore it to the bitmap (SUJ). There is no case where either SU or SUJ will put it back into the file once we have reached this point. > Also, shouldn't the BIO_DELETE only be issued for real devices, > and not the snapshots ? They should also be issued for snapshots. The only time that snapshots give up blocks is when they are deleted. The blocks that they give up at that time are all copies of blocks that were later changed in the filesystem. With the snapshot gone there will be no remaining references to those blocks and they will be made available for reallocation. As such, it would be helpful if they had been TRIMMED. > diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c > index b740bbb..4ff57ae 100644 > --- a/sys/ufs/ffs/ffs_alloc.c > +++ b/sys/ufs/ffs/ffs_alloc.c > @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$"); > > #include <security/audit/audit.h> > > +#include <geom/geom.h> > + > #include <ufs/ufs/dir.h> > #include <ufs/ufs/extattr.h> > #include <ufs/ufs/quota.h> > @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) > u_int cg; > u_int8_t *blksfree; > struct cdev *dev; > + struct bio *bip; > > cg = dtog(fs, bno); > if (devvp->v_type == VREG) { > @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) > softdep_setup_blkfree(UFSTOVFS(ump), bp, bno, > numfrags(fs, size), dephd); > bdwrite(bp); > + > + if (devvp->v_type != VREG) { > + bip = g_alloc_bio(); > + bip->bio_cmd = BIO_DELETE; > + bip->bio_offset = dbtob(fsbtodb(fs, bno)); > + bip->bio_done = g_destroy_bio; > + bip->bio_length = size; > + g_io_request(bip, > + (struct g_consumer *)devvp->v_bufobj.bo_private); > + } > } > > #ifdef INVARIANTS The above patch looks good though I would do it unconditionally (e.g., also for snapshots). It seems sensible to make it conditional on a sysctl variable so that folks could experiment with it more easily. And I would leave it off by default as non-SSD disks are unlikely to benefit from it. If it does prove useful for SSD disks then I would make it conditional on a filesystem flag so that it is possible to enable it on a filesystem-by-filesystem basis (e.g., on for filesystems on the SSD and off for the rest). Kirk McKusick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201012082223.oB8MNqGV020879>