Date: Tue, 13 May 2025 16:51:16 +1000 From: "Rob Norris" <robn@despairlabs.com> To: "Rick Macklem" <rick.macklem@gmail.com>, =?UTF-8?Q?Aur=C3=A9lien_Couderc?= <aurelien.couderc2002@gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: Sparse file support in FreeBSD NFSv4.2 server Message-ID: <3dfc035d-f825-432f-8a91-3da0abe10185@app.fastmail.com> In-Reply-To: <CAM5tNy5xPvcpy2mrtDg7OtBqqwSNUUJ44Ern8sAsR-wWRcqzzw@mail.gmail.com> References: <CA%2B1jF5pek05PwWid0M=A=qczZ0pfRzU1=psVczoEZ4N=y6Jj5A@mail.gmail.com> <CAM5tNy5T6feq9eD_9EFQ2p4yVcsQtUzfBPrmthPt_rJOc7Wiiw@mail.gmail.com> <CA%2B1jF5q9SBLCGghY5kPdZOfkgk5EKMf98SmHuyTzJwXOrStnJQ@mail.gmail.com> <CAM5tNy5xPvcpy2mrtDg7OtBqqwSNUUJ44Ern8sAsR-wWRcqzzw@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On Tue, 13 May 2025, at 12:38 AM, Rick Macklem wrote: > > > > - NFSv4.2 operation "ALLOCATE", to allocate disk space > > > Will never happen for ZFS because it is bhinkasically impossible. I am not a ZFS > > > guy, but that is what I have been told. UFS can do it, so it can be enabled if > > > all your exports are UFS file systems. > > > > Solaris has fnctl(F_ALLOCSP,...), so this should work on ZFS. > Well, I'm not a ZFS guy, but here is what I understood from the ZFS > folk w.r.t. this: > - When you write data to a file, new blocks are allocated for the data > bytes, even if > there is already old data written to those bytes. As such, it is > "impossible" to > guarantee that a write will not reply ENOSPACE/EQUOTA. > One responder did think it was possible, but listed several major changes > that would be required to make this possible on ZFS. (So "impossible" might > really be "too difficult to ever be implemented".) I _am_ an OpenZFS guy, and can say yes, this is correct. Creating a sparse region is easy, but it's the guarantee that future changes in that region will never run out of space is the tricky bit. Without having looked at it, I can see a way to do it by creating some object-specific operation to "write" but have it accounted to a dataset's ""reservation", rather than "used". Easy to say, difficult to do. I suspect the hardest part is figuring out the best way to keep a set of reserved ranges on each object. Incidentally, I think the same machinery is necessary to get a properly compliant implementaiton of posix_fallocate(2), which has the same guarantee. Cheers, Rob. [-- Attachment #2 --] <!DOCTYPE html><html><head><title></title></head><body><div>On Tue, 13 May 2025, at 12:38 AM, Rick Macklem wrote:<br></div><blockquote type="cite" id="qt" style=""><div>> > > - NFSv4.2 operation "ALLOCATE", to allocate disk space</div><div>> > Will never happen for ZFS because it is bhinkasically impossible. I am not a ZFS</div><div>> > guy, but that is what I have been told. UFS can do it, so it can be enabled if</div><div>> > all your exports are UFS file systems.</div><div>></div><div>> Solaris has fnctl(F_ALLOCSP,...), so this should work on ZFS.</div><div>Well, I'm not a ZFS guy, but here is what I understood from the ZFS</div><div>folk w.r.t. this:</div><div>- When you write data to a file, new blocks are allocated for the data</div><div>bytes, even if</div><div> there is already old data written to those bytes. As such, it is</div><div>"impossible" to</div><div> guarantee that a write will not reply ENOSPACE/EQUOTA.</div><div> One responder did think it was possible, but listed several major changes</div><div> that would be required to make this possible on ZFS. (So "impossible" might</div><div> really be "too difficult to ever be implemented".)</div></blockquote><div><br></div><div>I _am_ an OpenZFS guy, and can say yes, this is correct. Creating a sparse region is easy, but it's the guarantee that future changes in that region will never run out of space is the tricky bit.</div><div><br></div><div>Without having looked at it, I can see a way to do it by creating some object-specific operation to "write" but have it accounted to a dataset's ""reservation", rather than "used". Easy to say, difficult to do. I suspect the hardest part is figuring out the best way to keep a set of reserved ranges on each object.</div><div><br></div><div>Incidentally, I think the same machinery is necessary to get a properly compliant implementaiton of posix_fallocate(2), which has the same guarantee.</div><div><br></div><div>Cheers,</div><div>Rob.</div></body></html>help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3dfc035d-f825-432f-8a91-3da0abe10185>
