From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 16:46:19 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B8189106566B;
	Mon,  2 May 2011 16:46:19 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id 552898FC13;
	Mon,  2 May 2011 16:46:18 +0000 (UTC)
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p42GkGvE021696
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 3 May 2011 02:46:16 +1000
Date: Tue, 3 May 2011 02:46:16 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20110503020940.N2001@besplex.bde.org>
References: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 16:46:19 -0000

On Sun, 1 May 2011, Rick Macklem wrote:

>>[negative f_bavail and f_ffree]
> Well my concern isn't w.r.t. FreeBSD clients, but other ones. I'll
> start a discussion on freebsd-fs@ about whether a FreeBSD server
> should "cheat" and put negative values (which other clients will
> think are large positive values) on the wire or try and conform
> strictly to the RFC.
>
>> BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and
>> v4?) cases work? I can only see it in clients. Servers seem to start
>> with block counts and never convert to byte counts.
>
> It must be somewhere, since they are uint64_t byte counts on the wire,
> except for NFSv2, which used block counts of the block size provided
> in the same response.

I'm not sure how I missed this.  The multiplications are there.  They
have the usual (potential) overflow bugs and the usual (actual) sign
extension bugs.  See below.

I now see how to fix most of the overflow problems for the v2 case very
easily by scaling on the server, so that clients see only small values.
This should be portable.

Here is the new nfs server code for this:

 	if (nd->nd_flag & ND_NFSV2) {
 		NFSM_BUILD(tl, u_int32_t *, NFSX_V2STATFS);
 		*tl++ = txdr_unsigned(NFS_V2MAXDATA);
 		*tl++ = txdr_unsigned(sf->f_bsize);
 		*tl++ = txdr_unsigned(sf->f_blocks);
 		*tl++ = txdr_unsigned(sf->f_bfree);
 		*tl = txdr_unsigned(sf->f_bavail);

This just reads server fs values from struct statfs, blindly truncates
them, and puts them on the wire.  With just 1 more line -- a call to
statfs_scale_blocks(sf, UINT32_MAX), it can adjust sf so that all the
values fit on the wire.  Or more safely, it can get values that fit
in the 32-bit longs on old FreeBSD clients and in the bogusly-cast-to-int32_t
values in current FreeBSD clients by calling statfs_scale_blocks(sf,
INT32_MAX).  This should be portable.  The adjustments may scale the
block size from 16384 to a very large value, but clients should already
be able to handle the "any" value.  Already, the v2 block size value is
rarely NFS_FABLKSIZE and rarely what it used to be since it is under
server control.  It used to be usually 4096 for ffs, but it is now usually
16384 for ffs, and can easily be 65536 for ffs.  Above 65536 there might
be more problems but 65536 works up to 128 TB with an int32_t max (2**31-1
blocks times 2**16 bsize = 2**47 - 2**16).  Even ffs's default block size
of 16K works up to 32 TB.  File systems of size >= 32TB are still rare
and are more rarely used with v2, so perhaps the overflows haven't occurred
for anyone yet.

 	} else {
 		NFSM_BUILD(tl, u_int32_t *, NFSX_V3STATFS);
 		tval = (u_quad_t)sf->f_blocks;
 		tval *= (u_quad_t)sf->f_bsize;

This of course does the inverse of the scaling done by the client.

This could be written as:
 		tval = sf->f_blocks * sf->f_bsize;
The casts have no effect, since eveything is already 64 bits.  You may
as well assume this, since you have to assume lots about the types and
values for this code to work at all.  For example, suppose
sf->f_blocks >= 2**63 (so that full 64-bitness including no space for
a sign bit is actually needed for a block count).  Then multiplying
by a 64-bit sf->f_blocks may overflow, and you need to do a uint128_t
multiplication to avoid overflow.  This is difficult since uint128_t
ins not supported in C on any arch in FreeBSD.  Then the 128-bit values
won't fit on the wire, and the need to be scaled as in the v2 case.
But the v3 case doesn't seem to pass f_bsize, so it can't do this scaling
and would need to clamp.

 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_bfree;
 		tval *= (u_quad_t)sf->f_bsize;
 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_bavail;
 		tval *= (u_quad_t)sf->f_bsize;

The type errors are more serious for this signed field.  Suppose
sf->f_bsize is -1.  Then tval is initially 0xFFFFFFFFFFFFFFFF.
Suppose sf->f_bsize is 16K.  Then the multiplication overflows.
IIRC, the the result is implementation-defined (not undefined for
unsigned's) and is normally (uint64_t)-16K = 0xFFFFFFFFFFFFC000.
This is the right value for passing -16K as a large unsigned value.
Careful code would generate this value without using an overflowing
multiplication.

 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_files;
 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_ffree;
 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_ffree;
 		txdr_hyper(tval, tl); tl += 2;
 		*tl = 0;
 	}

> I'll try and make my Solaris10 box get to -ve frees and then see what
> it puts on the wire. After that, I'll start a discussion on freebsd-fs@
> about how they think a FreeBSD server should behave when f_bavail and/or
> f_ffree are negative.

The result on Solaris would be interesting.  Does Solaris still support
ffs?  You said later that you couldn't get it to generate negative values.

Bruce