Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 May 2011 03:52:47 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        rmacklem@freebsd.org, fs@freebsd.org
Subject:   Re: newnfs client and statfs
Message-ID:  <20110502022441.H2013@besplex.bde.org>
In-Reply-To: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>
References:  <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 1 May 2011, Rick Macklem wrote:

>>
>> % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
>> % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;
>> % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;
>>
>> The conversion for f_bavail still has sign extension bugs. f_bavail
>> can be negative on the server. A non-broken (FreeBSD) server passes
>> us this negative value as a uint64_t value with the top bit set.
>
> Well, both RFC1813 (NFSv3) and RFC3530 (NFSv4) specify the value on
> the wire (sf_abytes) as uint64_t. Therefore a negative value can't
> be represented safely and non-FreeBSD clients/servers would be
> confused by cheating and putting the negative value on the wire.
> (I see you mention this further down.)

But it can be represented.  FreeBSD servers always put it on the wire
(if the server file system has a negative value) until the old nfs
server broke it.  I can only find a few FreeBSD clients that aren't
confused by this:
- most or all clients work for the v2 case, because v2 doesn't need
   to scale for f_bavail, and copying the 31st (unsigned) bit to
   the 31st (signed) bit mostly works (except everthing breaks once
   the absolute values exceed 2**31-1 or 2**32.1).
- most clients are broken for the v3 case.  For negative f_bavail,
   sign-extension/overflow bugs in the scaling give a value of about
   2**54.  Assigning this to a 32-bit f_bavail gives unobvious garbage;
   assigning this to a 64-bit f_bavail gives obvious garbage.
- my FreeBSD-~5.2 v3 client handles negative f_bavail correctly (by
   scaling a signed value).  It doesn't fix f_ffree.  (I see negative
   f_bavail quite often but never run into the reserve for f_ffree.)

BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and
v4?) cases work?  I can only see it in clients.  Servers seem to start
with block counts and never convert to byte counts.

> The new server is broken in that it does not
> check for a negative value. It seems that the best approach for the
> server would be to send a 0 when f_bavail < 0. What else can you do

Hrmph.  It is servers that check and send a 0 when f_bavail < 0 that
are broken.

> without "cheating" and representing the value in a way that would be
> non-interoperable with non-BSD NFS clients?

I don't know.  See the NetBSD client for some ideas.  Note that for blocks
there are 2 "free" fields, f_bfree and f_bavail, while for files there is
only 1 (f_ffree).  You would think that the redundancy for blocks would
allow passing a negative value as the difference of 2 nonnegative ones,
but I couldn't make this work.  For FreeBSD clients that can handle this,
is it possible to negotiate the handling with the server?

> I agree the above is broken for the case where the high order bit
> of sf_abytes is set. How about the following code?
>
>  sbp->f_bavail = (sfp->f_abytes & OFF_MAX) / NFS_FABLKSIZE;

Doesn't work at all.  The byte count is typically a small negative
value, say -512.  This should be scaled to -1.  But
sbp->f_bavail = (uint64_t)-512 = 0xfffffffffffffe00.  Discarding just
1 top bit from this makes little difference to it.  No amount of
discarding top bits works correctly,  The value must be negated:

 	sbp->f_bavail = (int64_t)sbp->f_bavail / NFS_FABLKSIZE;

See the scaling function in vfs_syscalls.c for a worse method.  (1
technical difference: it wants to handle all fields using the same
unsigned max count, so it uses the negative of f_bavail and needs a
little more code for this.  1 unportability: it scales all the
fields by right shifting, but right shifting of negative values is
not guaranteed to handle the sign buit the same as division by a
power of 2.)


> (Yea, I see later in the message that you don't think
> OFF_MAX is the appropriate
> way to represent the largest positive value that can be stored
> in int64_t. As you'll see below, I don't know the correct way to
> express this constant and would be happy to hear how to do it?
> See below for more on this.)
>
> ...
>
> Other than that, the RFCs specify sf_tfiles as uint64_t and
> "struct statfs" has f_files as a uint64_t. So, unless there are plans to
> make it signed on FreeBSD, I don't see a problem here?

The problem is for f_bavail and f_ffree in statfs.  These are
intentionally signed to support ffs putting negative values in
them.  I think the protocol specifies uint64_t for sf_abytes and
sf_ffiles, so there is a minor theoretical problem even if negative
values aren't supported by nfs.  (sf_abytes might be 2**64-512.
Perhaps this is actually physically possible using a sparse mapping.
After scaling by NFS_FABLKSIZE, we can represent this value despite
using a signed type, but we have to know that it really is large
unsigned and not negative.  sf_ffiles might be 2**64-1, but this is
physically impossible.)

>> % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX);
>>
>> Any masking here is logically wrong, and in practice just destroys the
>> sign bit, as described above for the 0x7fffffff mask with old 32 bit
>> systems. Masking with OFF_MAX has additional logic errors. OFF_MAX
>> is the maximum value for an off_t, but none of the types here has
>> anything to do with off_t.
>
> Ok, sf_ffiles is defined as uint64_t on the wire. Therefore there is
> no sign bit. The problem is that it could be a larger positive value
> than FreeBSD supports. All I wanted this code to do is make it the

Everything is 64 bits, so there is no problem in practice.  The signed
type for f_ffiles gives a problem in theory -- it can only represent
63-bit unsigned value, but the wire has 64.  But more than 2**63-1
files is physically impossible, so there is no problem in practice.

You can either assume this, or write a maze of code to handle various
combinations of type sizes.  I prefer to not write actual code for
this.  A couple of assertions that the sizes are still 64 bits should
be enough.

> largest positive value that will fit in int64_t. (I used OFF_MAX
> because you suggested in a previous email that that was preferable
> to 0x7fffffffffffffffLLU for nm_maxfilesize. I don't see anything
> like INT64_MAX, UINT64_MAX in FreeBSD's limits.h)

These are in <sys/systm.h> via standard namespace pollution -- see another
reply.

> Would
>
>   if (sfp->sf_ffiles > UINT64_MAX)
>       sbp->f_ffree = INT64_MAX;
>   else
>       sbp->f_ffree = sfp->sf_ffiles;

s/ffree/ffiles/, and a few other fixes from a later reply (s/UINT64_MAX/
INT64_MAX).

INT64_MAX is currently the correct limit, but breaks automatically if
someone changes the type of f_ffree.  I have some fancy macros (never
fully implemented in actual code) to determine the limits from the
types (sizeof(sbp->f_ffree) gives the number of bits provided it is
a fixed-width type...).

> - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as
>  far as I can see. How do I express these constants? Do I have to
>  convert 0x7ffffffffffffff to decimal and use that?

Avoid them if possible.  You should only need them if you clamp the
values.

>> % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) {
>> % sbp->f_bsize = (int32_t)sfp->sf_bsize;
>> % sbp->f_blocks = (int32_t)sfp->sf_blocks;
>>
>> I think this is just the v2 case. The old nfs client uses essentially
>> the same bogus casts. No casts should be used (clamping should be
>> used), but if we use casts it may be possible to use non-bogus ones.
>> I think these are just no casts for the unsigned fields but int32_t
>> for the signed ones. The v2 protocol is limited to 32 bits, and we
>> can easily represent any 32-bit value since we have 64-bit fields for
>> the lvalues. We just need to be careful with the sign bit (in the
>> 31th bit of an unsigned value in the sfp fields), but can keep the
>> 31th bit as an unsigned bit without problems now that the statfs
>> fields
>> are 64 bits. Casting for the unsigned fields now just breaks the value
>> unnecessarily if the protocol manages to pass the 31th bit as a value
>> bit for such fields.
>>
> Ok, I could take the casts off. I think the effect would be that, for the
> case where sf_bavail has its high order (bit 31) set, it will be seen as
> a larger positive value. (sf_bavail is u_int32_t) This would be correct
> per the RFCs, since RFC1094 defines the fields as uint32_t. Now, if
> servers were "cheating" and putting the negative values in the field on
> the wire, it will change the semantics a bit.

The v2 case is much closer to hitting the limits, since we can now
easily have a server with >= 2**32 512-blocks and someone might want
to use the v2 protocol for it.  ino_t is still 32 bits so file counts
can't exceed the v2 protocol limits yet, but that will change soon.

> I'll admit I tend to feel that the safest thing is to just leave it
> the way it is, since no one is complaining about the semantics and I'd
> rather not "break" anything by fixing the semantics to agree with thr RFC.
>
>> Servers should pay even more attention to unrepresentable bits than to
>> sign bis, but pay considerably less. Both the old and the new nfs
>> server
>> blindly truncate f_bfree, etc., to 32 bits in the v2 case (except the
>> old
>> nfs server corrupts negative f_bavail to 0).
>
> As above, I have to disagree with this. If the RFCs say it can't be
> negative, then sending negative values as 0 is all that can be done,
> as far as I can see. (I think the old server got this case correct
> and the new server needs to be fixed.)

The corruption also involves sending positive values as 0.  E.g., 4G on
the server becomes 0 on the client after blind truncation to 32 bits.
Clamping to UINT32_MAX or INT32_MAX would reduce problems from this.
Only the server can do the clamping, since the client has no way to tell
whether 0 really means 0.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110502022441.H2013>