Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Nov 2018 15:52:39 -0700
From:      Alan Somers <asomers@freebsd.org>
To:        Warner Losh <imp@bsdimp.com>
Cc:        freebsd-arch@freebsd.org, freebsd-fs <freebsd-fs@freebsd.org>,  FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   Re: Hole-punching, TRIM, etc
Message-ID:  <CAOtMX2gjsC_ZbhpcuYsm=Ep-Pregr8wOVC8DYbkQ_93a%2BifD_Q@mail.gmail.com>
In-Reply-To: <CANCZdfp5UDcH-SLDVvvhkB0dTnhuP0tZ8YT0tUJkF8egAZgYuA@mail.gmail.com>
References:  <CAOtMX2jgb_Pf9-MqirM=xihVpyRmAGZKx2VRnvA_1Fx6kMYXXg@mail.gmail.com> <CANCZdfp5UDcH-SLDVvvhkB0dTnhuP0tZ8YT0tUJkF8egAZgYuA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Nov 13, 2018 at 3:51 PM Warner Losh <imp@bsdimp.com> wrote:

>
>
> On Tue, Nov 13, 2018 at 3:10 PM Alan Somers <asomers@freebsd.org> wrote:
>
>> Hole-punching has been discussed on these lists before[1].  It basically
>> means to turn a dense file into a sparse file by deallocating storage for
>> some of the blocks in the middle.  There's no standard API for it.  Linux
>> uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
>>
>> A related concept is telling a block device that some blocks are no longer
>> used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
>> "Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
>> basically the same thing, and it's analogous to hole-punching for regular
>> files.  They are also all inaccessible from FreeBSD's userland except by
>> using pass(4), which is inconvenient and protocol-specific.
>>
>> Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
>> but it's totally undocumented and doesn't work on regular files.
>>
>> I propose adding support for all of these things using the fcntl(2) API.
>> Using the same syntax that Solaris defined, you would be able to punch a
>> hole in a regular file or TRIM blocks from an SSD.  ZFS already supports
>> it
>> (though FreeBSD's port never did, and the code was deleted in r303763).
>> Here's what I would do:
>>
>> 1) Add the F_FREESP command to fcntl(2).
>> 2) Add a .fo_space field for struct fileops
>> 3) Add a devfs_space method that implements .fo_space
>> 4) Add a .d_space field to struct cdevsw
>> 5) Add a g_dev_space method for GEOM that implements .d_space using
>> BIO_DELETE.
>> 6) Add a VOP_SPACE vop
>> 7) Implement VOP_SPACE for tmpfs
>> 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
>>
>> The greatest beneficiaries of this work would be type 2 hypervisors like
>> QEMU and VirtualBox with guests that use TRIM, and userland filesystems
>> such as fusefs-ext2 and fusefs-exfat.  High-performance storage systems
>> using SPDK would also benefit.  The last item, aio_freesp(2), may seem
>> unnecessary but it would really benefit my application.
>>
>> Questions, objections, flames?
>>
>
> So the fcntl would deallocate blocks from a filesystem only. The
> filesystem may issue BIO_DELETE as a result, but that's up to the
> filesystem, correct?
>

Correct.


>
> On a raw device it would be translated into a BIO_DELETE command directly,
> correct?
>

Correct, modulo edge cases.


>
> Warner
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2gjsC_ZbhpcuYsm=Ep-Pregr8wOVC8DYbkQ_93a%2BifD_Q>