From owner-freebsd-fs@FreeBSD.ORG Mon May 2 16:46:19 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B8189106566B; Mon, 2 May 2011 16:46:19 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 552898FC13; Mon, 2 May 2011 16:46:18 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p42GkGvE021696 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 May 2011 02:46:16 +1000 Date: Tue, 3 May 2011 02:46:16 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110503020940.N2001@besplex.bde.org> References: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 16:46:19 -0000 On Sun, 1 May 2011, Rick Macklem wrote: >>[negative f_bavail and f_ffree] > Well my concern isn't w.r.t. FreeBSD clients, but other ones. I'll > start a discussion on freebsd-fs@ about whether a FreeBSD server > should "cheat" and put negative values (which other clients will > think are large positive values) on the wire or try and conform > strictly to the RFC. > >> BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and >> v4?) cases work? I can only see it in clients. Servers seem to start >> with block counts and never convert to byte counts. > > It must be somewhere, since they are uint64_t byte counts on the wire, > except for NFSv2, which used block counts of the block size provided > in the same response. I'm not sure how I missed this. The multiplications are there. They have the usual (potential) overflow bugs and the usual (actual) sign extension bugs. See below. I now see how to fix most of the overflow problems for the v2 case very easily by scaling on the server, so that clients see only small values. This should be portable. Here is the new nfs server code for this: if (nd->nd_flag & ND_NFSV2) { NFSM_BUILD(tl, u_int32_t *, NFSX_V2STATFS); *tl++ = txdr_unsigned(NFS_V2MAXDATA); *tl++ = txdr_unsigned(sf->f_bsize); *tl++ = txdr_unsigned(sf->f_blocks); *tl++ = txdr_unsigned(sf->f_bfree); *tl = txdr_unsigned(sf->f_bavail); This just reads server fs values from struct statfs, blindly truncates them, and puts them on the wire. With just 1 more line -- a call to statfs_scale_blocks(sf, UINT32_MAX), it can adjust sf so that all the values fit on the wire. Or more safely, it can get values that fit in the 32-bit longs on old FreeBSD clients and in the bogusly-cast-to-int32_t values in current FreeBSD clients by calling statfs_scale_blocks(sf, INT32_MAX). This should be portable. The adjustments may scale the block size from 16384 to a very large value, but clients should already be able to handle the "any" value. Already, the v2 block size value is rarely NFS_FABLKSIZE and rarely what it used to be since it is under server control. It used to be usually 4096 for ffs, but it is now usually 16384 for ffs, and can easily be 65536 for ffs. Above 65536 there might be more problems but 65536 works up to 128 TB with an int32_t max (2**31-1 blocks times 2**16 bsize = 2**47 - 2**16). Even ffs's default block size of 16K works up to 32 TB. File systems of size >= 32TB are still rare and are more rarely used with v2, so perhaps the overflows haven't occurred for anyone yet. } else { NFSM_BUILD(tl, u_int32_t *, NFSX_V3STATFS); tval = (u_quad_t)sf->f_blocks; tval *= (u_quad_t)sf->f_bsize; This of course does the inverse of the scaling done by the client. This could be written as: tval = sf->f_blocks * sf->f_bsize; The casts have no effect, since eveything is already 64 bits. You may as well assume this, since you have to assume lots about the types and values for this code to work at all. For example, suppose sf->f_blocks >= 2**63 (so that full 64-bitness including no space for a sign bit is actually needed for a block count). Then multiplying by a 64-bit sf->f_blocks may overflow, and you need to do a uint128_t multiplication to avoid overflow. This is difficult since uint128_t ins not supported in C on any arch in FreeBSD. Then the 128-bit values won't fit on the wire, and the need to be scaled as in the v2 case. But the v3 case doesn't seem to pass f_bsize, so it can't do this scaling and would need to clamp. txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_bfree; tval *= (u_quad_t)sf->f_bsize; txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_bavail; tval *= (u_quad_t)sf->f_bsize; The type errors are more serious for this signed field. Suppose sf->f_bsize is -1. Then tval is initially 0xFFFFFFFFFFFFFFFF. Suppose sf->f_bsize is 16K. Then the multiplication overflows. IIRC, the the result is implementation-defined (not undefined for unsigned's) and is normally (uint64_t)-16K = 0xFFFFFFFFFFFFC000. This is the right value for passing -16K as a large unsigned value. Careful code would generate this value without using an overflowing multiplication. txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_files; txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_ffree; txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_ffree; txdr_hyper(tval, tl); tl += 2; *tl = 0; } > I'll try and make my Solaris10 box get to -ve frees and then see what > it puts on the wire. After that, I'll start a discussion on freebsd-fs@ > about how they think a FreeBSD server should behave when f_bavail and/or > f_ffree are negative. The result on Solaris would be interesting. Does Solaris still support ffs? You said later that you couldn't get it to generate negative values. Bruce