Date: Fri, 10 Dec 2004 11:25:42 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Palle Girgensohn <girgen@freebsd.org> Cc: freebsd-amd64@freebsd.org Subject: Re: amd64/74811: df, nfs mount, negative Avail -> 32/64-bit confusion Message-ID: <20041210092445.Y28756@delplex.bde.org> In-Reply-To: <200412071156.iB7BuDHE077345@rambutan.pingpong.net> References: <200412071156.iB7BuDHE077345@rambutan.pingpong.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 7 Dec 2004, Palle Girgensohn wrote: > >Description: > > using FreeBSD 5.3 amd64 as nfs client > FreeBSD 4.10 i386 as nfs server The combination of client and server is critical for demonstrating this bug. Broken servers don't implement negative avail counts. FreeBSD-5's server was broken in rev.1.140 of nfs_serv.c to "fix" the problem reported in this PR. FreeBSD-4's server remains unbroken. > when a disk is filled up over 100%, Avail becomes negative on the > server, but hugely postive on the 64-bit platform. Not very > surprising, but still a bug... :) > > 4.10 i386 server: > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/da6s1f 17388202 16153532 -156386 101% /dumps/0 > > 5.3 amd64 client: > Filesystem 1K-blocks Used Avail Capacity Mounted on > banan:/dumps/0 17388202 16153532 18014398509325598 0% /mnt This is caused by sign extension/overflow bugs in nfs_vfsops.c. From the version in FreeBSD-5.3 (rev.1.158): % u_quad_t tquad; % ... % tquad = fxdr_hyper(&sfp->sf_abytes); % if (((long)(tquad / bsize) > LONG_MAX) || ^^^^^^^^^^^^^^^^^^^^^ % ((long)(tquad / bsize) < LONG_MIN)) ^^^^^^^^^^^^^^^^^^^^^ % continue; % sbp->f_bavail = tquad / bsize; % ^^^^^^^^^^^^^ -156386 1K-blocks is passed by the server as (uint64_t)(-156386 * 1024) = (2**64 - 156386 * 1024). It needs to be converted back to a signed quantity before dividing it by bsize, but this is not done. tquad is still (2**64 - 156386 * 1024). bsize is always 512 in FreeBSD-5.3 (**). The division gives the wrong value (2**55 - 156386 * 2). This is passed back to userland. It is a block count in 512-blocks, so df divides it by 2 to convert to 1K-blocks. The final value printed is (2**54 - 156386) = 18014398509325598. The magic number 18014398509325598 is easy to recognize. 2**64 is 1844..., so huge values starting with the digits 18 are often misrepresentations of small negative values converted to uint64_t. Here the value is 1801... instead of 1844..., and on closer examination has 3 fewer digits. It is just the corresponding 1844... value divided by 2**10 = 1024 to convert to 1K-blocks. Applications like df could recognize such magic numbers (not so) similarly and fix them up, but shouldn't have to. (*) Other aspects of this bug include the code that doubles bsize actually being executed in some versions of FreeBSD on some machines, including -current on i386's. It is broken and gives a kernel panic for division by bsize = 0 for about half of all possible values for negative available space, including all values that are likely to occur (small negative ones). See PR 56606 for more details of older aspects of this bug suite. There are many newer ones. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041210092445.Y28756>