Date: Mon, 2 May 2011 03:52:47 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs Message-ID: <20110502022441.H2013@besplex.bde.org> In-Reply-To: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca> References: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 1 May 2011, Rick Macklem wrote: >> >> % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; >> % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; >> % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; >> >> The conversion for f_bavail still has sign extension bugs. f_bavail >> can be negative on the server. A non-broken (FreeBSD) server passes >> us this negative value as a uint64_t value with the top bit set. > > Well, both RFC1813 (NFSv3) and RFC3530 (NFSv4) specify the value on > the wire (sf_abytes) as uint64_t. Therefore a negative value can't > be represented safely and non-FreeBSD clients/servers would be > confused by cheating and putting the negative value on the wire. > (I see you mention this further down.) But it can be represented. FreeBSD servers always put it on the wire (if the server file system has a negative value) until the old nfs server broke it. I can only find a few FreeBSD clients that aren't confused by this: - most or all clients work for the v2 case, because v2 doesn't need to scale for f_bavail, and copying the 31st (unsigned) bit to the 31st (signed) bit mostly works (except everthing breaks once the absolute values exceed 2**31-1 or 2**32.1). - most clients are broken for the v3 case. For negative f_bavail, sign-extension/overflow bugs in the scaling give a value of about 2**54. Assigning this to a 32-bit f_bavail gives unobvious garbage; assigning this to a 64-bit f_bavail gives obvious garbage. - my FreeBSD-~5.2 v3 client handles negative f_bavail correctly (by scaling a signed value). It doesn't fix f_ffree. (I see negative f_bavail quite often but never run into the reserve for f_ffree.) BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and v4?) cases work? I can only see it in clients. Servers seem to start with block counts and never convert to byte counts. > The new server is broken in that it does not > check for a negative value. It seems that the best approach for the > server would be to send a 0 when f_bavail < 0. What else can you do Hrmph. It is servers that check and send a 0 when f_bavail < 0 that are broken. > without "cheating" and representing the value in a way that would be > non-interoperable with non-BSD NFS clients? I don't know. See the NetBSD client for some ideas. Note that for blocks there are 2 "free" fields, f_bfree and f_bavail, while for files there is only 1 (f_ffree). You would think that the redundancy for blocks would allow passing a negative value as the difference of 2 nonnegative ones, but I couldn't make this work. For FreeBSD clients that can handle this, is it possible to negotiate the handling with the server? > I agree the above is broken for the case where the high order bit > of sf_abytes is set. How about the following code? > > sbp->f_bavail = (sfp->f_abytes & OFF_MAX) / NFS_FABLKSIZE; Doesn't work at all. The byte count is typically a small negative value, say -512. This should be scaled to -1. But sbp->f_bavail = (uint64_t)-512 = 0xfffffffffffffe00. Discarding just 1 top bit from this makes little difference to it. No amount of discarding top bits works correctly, The value must be negated: sbp->f_bavail = (int64_t)sbp->f_bavail / NFS_FABLKSIZE; See the scaling function in vfs_syscalls.c for a worse method. (1 technical difference: it wants to handle all fields using the same unsigned max count, so it uses the negative of f_bavail and needs a little more code for this. 1 unportability: it scales all the fields by right shifting, but right shifting of negative values is not guaranteed to handle the sign buit the same as division by a power of 2.) > (Yea, I see later in the message that you don't think > OFF_MAX is the appropriate > way to represent the largest positive value that can be stored > in int64_t. As you'll see below, I don't know the correct way to > express this constant and would be happy to hear how to do it? > See below for more on this.) > > ... > > Other than that, the RFCs specify sf_tfiles as uint64_t and > "struct statfs" has f_files as a uint64_t. So, unless there are plans to > make it signed on FreeBSD, I don't see a problem here? The problem is for f_bavail and f_ffree in statfs. These are intentionally signed to support ffs putting negative values in them. I think the protocol specifies uint64_t for sf_abytes and sf_ffiles, so there is a minor theoretical problem even if negative values aren't supported by nfs. (sf_abytes might be 2**64-512. Perhaps this is actually physically possible using a sparse mapping. After scaling by NFS_FABLKSIZE, we can represent this value despite using a signed type, but we have to know that it really is large unsigned and not negative. sf_ffiles might be 2**64-1, but this is physically impossible.) >> % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX); >> >> Any masking here is logically wrong, and in practice just destroys the >> sign bit, as described above for the 0x7fffffff mask with old 32 bit >> systems. Masking with OFF_MAX has additional logic errors. OFF_MAX >> is the maximum value for an off_t, but none of the types here has >> anything to do with off_t. > > Ok, sf_ffiles is defined as uint64_t on the wire. Therefore there is > no sign bit. The problem is that it could be a larger positive value > than FreeBSD supports. All I wanted this code to do is make it the Everything is 64 bits, so there is no problem in practice. The signed type for f_ffiles gives a problem in theory -- it can only represent 63-bit unsigned value, but the wire has 64. But more than 2**63-1 files is physically impossible, so there is no problem in practice. You can either assume this, or write a maze of code to handle various combinations of type sizes. I prefer to not write actual code for this. A couple of assertions that the sizes are still 64 bits should be enough. > largest positive value that will fit in int64_t. (I used OFF_MAX > because you suggested in a previous email that that was preferable > to 0x7fffffffffffffffLLU for nm_maxfilesize. I don't see anything > like INT64_MAX, UINT64_MAX in FreeBSD's limits.h) These are in <sys/systm.h> via standard namespace pollution -- see another reply. > Would > > if (sfp->sf_ffiles > UINT64_MAX) > sbp->f_ffree = INT64_MAX; > else > sbp->f_ffree = sfp->sf_ffiles; s/ffree/ffiles/, and a few other fixes from a later reply (s/UINT64_MAX/ INT64_MAX). INT64_MAX is currently the correct limit, but breaks automatically if someone changes the type of f_ffree. I have some fancy macros (never fully implemented in actual code) to determine the limits from the types (sizeof(sbp->f_ffree) gives the number of bits provided it is a fixed-width type...). > - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as > far as I can see. How do I express these constants? Do I have to > convert 0x7ffffffffffffff to decimal and use that? Avoid them if possible. You should only need them if you clamp the values. >> % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) { >> % sbp->f_bsize = (int32_t)sfp->sf_bsize; >> % sbp->f_blocks = (int32_t)sfp->sf_blocks; >> >> I think this is just the v2 case. The old nfs client uses essentially >> the same bogus casts. No casts should be used (clamping should be >> used), but if we use casts it may be possible to use non-bogus ones. >> I think these are just no casts for the unsigned fields but int32_t >> for the signed ones. The v2 protocol is limited to 32 bits, and we >> can easily represent any 32-bit value since we have 64-bit fields for >> the lvalues. We just need to be careful with the sign bit (in the >> 31th bit of an unsigned value in the sfp fields), but can keep the >> 31th bit as an unsigned bit without problems now that the statfs >> fields >> are 64 bits. Casting for the unsigned fields now just breaks the value >> unnecessarily if the protocol manages to pass the 31th bit as a value >> bit for such fields. >> > Ok, I could take the casts off. I think the effect would be that, for the > case where sf_bavail has its high order (bit 31) set, it will be seen as > a larger positive value. (sf_bavail is u_int32_t) This would be correct > per the RFCs, since RFC1094 defines the fields as uint32_t. Now, if > servers were "cheating" and putting the negative values in the field on > the wire, it will change the semantics a bit. The v2 case is much closer to hitting the limits, since we can now easily have a server with >= 2**32 512-blocks and someone might want to use the v2 protocol for it. ino_t is still 32 bits so file counts can't exceed the v2 protocol limits yet, but that will change soon. > I'll admit I tend to feel that the safest thing is to just leave it > the way it is, since no one is complaining about the semantics and I'd > rather not "break" anything by fixing the semantics to agree with thr RFC. > >> Servers should pay even more attention to unrepresentable bits than to >> sign bis, but pay considerably less. Both the old and the new nfs >> server >> blindly truncate f_bfree, etc., to 32 bits in the v2 case (except the >> old >> nfs server corrupts negative f_bavail to 0). > > As above, I have to disagree with this. If the RFCs say it can't be > negative, then sending negative values as 0 is all that can be done, > as far as I can see. (I think the old server got this case correct > and the new server needs to be fixed.) The corruption also involves sending positive values as 0. E.g., 4G on the server becomes 0 on the client after blind truncation to 32 bits. Clamping to UINT32_MAX or INT32_MAX would reduce problems from this. Only the server can do the clamping, since the client has no way to tell whether 0 really means 0. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110502022441.H2013>