Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 May 2025 16:51:16 +1000
From:      "Rob Norris" <robn@despairlabs.com>
To:        "Rick Macklem" <rick.macklem@gmail.com>, =?UTF-8?Q?Aur=C3=A9lien_Couderc?= <aurelien.couderc2002@gmail.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Sparse file support in FreeBSD NFSv4.2 server
Message-ID:  <3dfc035d-f825-432f-8a91-3da0abe10185@app.fastmail.com>
In-Reply-To:  <CAM5tNy5xPvcpy2mrtDg7OtBqqwSNUUJ44Ern8sAsR-wWRcqzzw@mail.gmail.com>
References:   <CA%2B1jF5pek05PwWid0M=A=qczZ0pfRzU1=psVczoEZ4N=y6Jj5A@mail.gmail.com> <CAM5tNy5T6feq9eD_9EFQ2p4yVcsQtUzfBPrmthPt_rJOc7Wiiw@mail.gmail.com> <CA%2B1jF5q9SBLCGghY5kPdZOfkgk5EKMf98SmHuyTzJwXOrStnJQ@mail.gmail.com> <CAM5tNy5xPvcpy2mrtDg7OtBqqwSNUUJ44Ern8sAsR-wWRcqzzw@mail.gmail.com>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Tue, 13 May 2025, at 12:38 AM, Rick Macklem wrote:
> > > > - NFSv4.2 operation "ALLOCATE", to allocate disk space
> > > Will never happen for ZFS because it is bhinkasically impossible. I am not a ZFS
> > > guy, but that is what I have been told. UFS can do it, so it can be enabled if
> > > all your exports are UFS file systems.
> >
> > Solaris has fnctl(F_ALLOCSP,...), so this should work on ZFS.
> Well, I'm not a ZFS guy, but here is what I understood from the ZFS
> folk w.r.t. this:
> - When you write data to a file, new blocks are allocated for the data
> bytes, even if
>   there is already old data written to those bytes.  As such, it is
> "impossible" to
>   guarantee that a write will not reply ENOSPACE/EQUOTA.
>   One responder did think it was possible, but listed several major changes
>   that would be required to make this possible on ZFS. (So "impossible" might
>   really be "too difficult to ever be implemented".)

I _am_ an OpenZFS guy, and can say yes, this is correct. Creating a sparse region is easy, but it's the guarantee that future changes in that region will never run out of space is the tricky bit.

Without having looked at it, I can see a way to do it by creating some object-specific operation to "write" but have it accounted to a dataset's ""reservation", rather than "used". Easy to say, difficult to do. I suspect the hardest part is figuring out the best way to keep a set of reserved ranges on each object.

Incidentally, I think the same machinery is necessary to get a properly compliant implementaiton of posix_fallocate(2), which has the same guarantee.

Cheers,
Rob.
[-- Attachment #2 --]
<!DOCTYPE html><html><head><title></title></head><body><div>On Tue, 13 May 2025, at 12:38 AM, Rick Macklem wrote:<br></div><blockquote type="cite" id="qt" style=""><div>&gt; &gt; &gt; - NFSv4.2 operation "ALLOCATE", to allocate disk space</div><div>&gt; &gt; Will never happen for ZFS because it is bhinkasically impossible. I am not a ZFS</div><div>&gt; &gt; guy, but that is what I have been told. UFS can do it, so it can be enabled if</div><div>&gt; &gt; all your exports are UFS file systems.</div><div>&gt;</div><div>&gt; Solaris has fnctl(F_ALLOCSP,...), so this should work on ZFS.</div><div>Well, I'm not a ZFS guy, but here is what I understood from the ZFS</div><div>folk w.r.t. this:</div><div>- When you write data to a file, new blocks are allocated for the data</div><div>bytes, even if</div><div>&nbsp; there is already old data written to those bytes.&nbsp; As such, it is</div><div>"impossible" to</div><div>&nbsp; guarantee that a write will not reply ENOSPACE/EQUOTA.</div><div>&nbsp; One responder did think it was possible, but listed several major changes</div><div>&nbsp; that would be required to make this possible on ZFS. (So "impossible" might</div><div>&nbsp; really be "too difficult to ever be implemented".)</div></blockquote><div><br></div><div>I _am_ an OpenZFS guy, and can say yes, this is correct. Creating a sparse region is easy, but it's the guarantee that future changes in that region will never run out of space is the tricky bit.</div><div><br></div><div>Without having looked at it, I can see a way to do it by creating some object-specific operation to "write" but have it accounted to a dataset's ""reservation", rather than "used". Easy to say, difficult to do. I suspect the hardest part is figuring out the best way to keep a set of reserved ranges on each object.</div><div><br></div><div>Incidentally, I think the same machinery is necessary to get a properly compliant implementaiton of posix_fallocate(2), which has the same guarantee.</div><div><br></div><div>Cheers,</div><div>Rob.</div></body></html>
help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3dfc035d-f825-432f-8a91-3da0abe10185>