From owner-freebsd-arch@FreeBSD.ORG Thu Nov 6 03:44:51 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F277416A4CF; Thu, 6 Nov 2003 03:44:50 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id D7A6F43FE3; Thu, 6 Nov 2003 03:44:48 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id WAA31720; Thu, 6 Nov 2003 22:44:34 +1100 Date: Thu, 6 Nov 2003 22:44:33 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Kirk McKusick In-Reply-To: <200311060504.hA654feN034044@beastie.mckusick.com> Message-ID: <20031106211046.J7380@gamplex.bde.org> References: <200311060504.hA654feN034044@beastie.mckusick.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Robert Watson cc: arch@freebsd.org cc: Peter Wemm Subject: Re: >0x7fffffff blocksize filesystem reporting X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Nov 2003 11:44:51 -0000 On Wed, 5 Nov 2003, Kirk McKusick wrote: > I have gone back and resurrected the changes for the updated statfs > structure that were discussed on arch some months ago. Because they > ... > Index: sys/sys/mount.h > =================================================================== > RCS file: /usr/ncvs/src/sys/sys/mount.h,v > retrieving revision 1.148 > diff -c -r1.148 mount.h > *** sys/sys/mount.h 1 Jul 2003 17:40:23 -0000 1.148 > --- sys/sys/mount.h 5 Nov 2003 05:10:54 -0000 > *************** > *** 63,75 **** > /* > * filesystem statistics > */ > > ! #define MFSNAMELEN 16 /* length of fs type name, including null */ > ! #define MNAMELEN (88 - 2 * sizeof(long)) /* size of on/from name bufs */ > > /* XXX getfsstat.2 is out of date with write and read counter changes here. */ > /* XXX statfs.2 is out of date with read counter changes here. */ > ! struct statfs { > long f_spare2; /* placeholder */ > long f_bsize; /* fundamental filesystem block size */ > long f_iosize; /* optimal transfer block size */ > --- 63,103 ---- > /* > * filesystem statistics > */ > + #define MFSNAMELEN 16 /* length of type name including null */ > + #define MNAMELEN 80 /* size of on/from name bufs */ As pointed out by tjr, there are buffer overflows from bcopy() MNAMELEN bytes. This is because the above MNAMELEN Is smaller than the old MNAMELEN on systems with sizeof(long) > 4. It should be either the historical MNAMELEN of 90 or 88, or larger than that since we have room for expansion, or no larger than the previous smallest MNAMELEN for all machines, depending on whether the overflow bug is fixed by truncation on conversion or avoided by truncation when the name is copied in. > + #define STATFS_VERSION 0x20030518 /* current version number */ Style bug (space instead of tab). > + struct statfs { > + u_int32_t f_version; /* structure version number */ > + u_int32_t f_type; /* type of filesystem */ > + u_int64_t f_flags; /* copy of mount exported flags */ > + u_int64_t f_bsize; /* filesystem fragment size */ > + u_int64_t f_iosize; /* optimal transfer block size */ > + u_int64_t f_blocks; /* total data blocks in filesystem */ > + u_int64_t f_bfree; /* free blocks in filesystem */ > + int64_t f_bavail; /* free blocks avail to non-superuser */ > + u_int64_t f_files; /* total file nodes in filesystem */ > + int64_t f_ffree; /* free nodes avail to non-superuser */ > + u_int64_t f_syncwrites; /* count of sync writes since mount */ > + u_int64_t f_asyncwrites; /* count of async writes since mount */ > + u_int64_t f_syncreads; /* count of sync reads since mount */ > + u_int64_t f_asyncreads; /* count of async reads since mount */ > + u_int64_t f_spare[10]; /* unused spare */ > + u_int32_t f_namemax; /* maximum filename length */ I disklike all these unsigned types, and to a lesser extent, typedefed types and some of the 64-bit types. Unsigned types limit possibilties for trapping overflow if overflow actually occurs. They should not be used to get an extra bit of precision unless the extra bit is very important. The old code got this right by using only signed types. f_bavail and f_ffree need to go negative so they are correctly signed types, but this means that the extra bit for the 64-bit unsigned block and node counts cannot actually be used (since values that use it would cause overflow if most of the blocks or nodes are free). Typedefed types are difficult to print, as shown by printf format errors in most of the formats changed in this patch. But forwards compatibility at the source level requires them. Why 64-bit types for f_bsize and f_iosize? If unsigned fixed-width types are used, then they should be named in a standard way (uintN_t, not u_intN_t). There are also some standard POSIX types for block counts, etc. These are used in POSIX's variant of statfs(). E.g., there is blkcnt_t, which is a signed integral type not yet implemented in FreeBSD. In the XSI extension there is also fsblkcnt_t, which is an _unsigned_ integral type implemented as uint64_t in FreeBSD. Its unsignedness is apparently derived from the old XSI type of unsigned long for block counts. > + uid_t f_owner; /* user that mounted the filesystem */ > + fsid_t f_fsid; /* filesystem id */ > + char f_charspare[76]; /* spare string space */ > + char f_fstypename[MFSNAMELEN]; /* filesystem type name */ > + char f_mntfromname[MNAMELEN]; /* mounted filesystem */ > + char f_mntonname[MNAMELEN]; /* directory on which mounted */ > + }; f_charspare would be better at the end. If explicit padding is needed after f_fsid_t, then it should be in terms of fsid_t spares. Similarly for spares after f_namemax and f_owner. These are only packed because uid_t happens to be 32-bit. fsid_t is 2*32-bit so it doesn't need padding either. 76 is an odd amount of padding. It gives 252 bytes of char arrays altogether and a struct size of 452 on i386's. 452 is 4 mod 8, so there are 4 bytes of unnamed padding at the end of the struct on most 64-bit arches (if not internal padding). > Index: sys/kern/vfs_syscalls.c > =================================================================== > RCS file: /usr/ncvs/src/sys/kern/vfs_syscalls.c,v > retrieving revision 1.332 > diff -c -r1.332 vfs_syscalls.c > *** sys/kern/vfs_syscalls.c 19 Oct 2003 20:41:07 -0000 1.332 > --- sys/kern/vfs_syscalls.c 5 Nov 2003 05:10:54 -0000 > ... > + /* > + * Convert a new format statfs structure to an old format statfs structure. > + */ > + static void > + cvtstatfs(td, nsp, osp) > + struct thread *td; > + struct statfs *nsp; > + struct ostatfs *osp; > + { > + > + bzero(osp, sizeof(*osp)); > + osp->f_bsize = nsp->f_bsize; > + osp->f_iosize = nsp->f_iosize; > + osp->f_blocks = nsp->f_blocks; > + osp->f_bfree = nsp->f_bfree; > + osp->f_bavail = nsp->f_bavail; > + osp->f_files = nsp->f_files; > + osp->f_ffree = nsp->f_ffree; tjr suggested setting the values to LONG_MAX instead of blindly truncating them I think POSIX has some methods for dealing with such overflows (from LFS). I thing they would reduce to returning -1/EOVERFLOW here, which is not very useful. tjr also suggested using a fake block size. This is already done in nfs, but the implementation is broken (nfs3 has similar 32-bit limits in the client-server interface). > Index: sys/kern/vfs_bio.c > =================================================================== > RCS file: /usr/ncvs/src/sys/kern/vfs_bio.c,v > retrieving revision 1.420 > diff -c -r1.420 vfs_bio.c > *** sys/kern/vfs_bio.c 4 Nov 2003 06:30:00 -0000 1.420 > --- sys/kern/vfs_bio.c 5 Nov 2003 05:10:54 -0000 > *************** > *** 3239,3245 **** > (int) m->pindex, (int)(foff >> 32), > (int) foff & 0xffffffff, resid, i); > if (!vn_isdisk(vp, NULL)) > ! printf(" iosize: %ld, lblkno: %jd, flags: 0x%x, npages: %d\n", > bp->b_vp->v_mount->mnt_stat.f_iosize, > (intmax_t) bp->b_lblkno, > bp->b_flags, bp->b_npages); > --- 3239,3245 ---- > (int) m->pindex, (int)(foff >> 32), > (int) foff & 0xffffffff, resid, i); > if (!vn_isdisk(vp, NULL)) > ! printf(" iosize: %jd, lblkno: %jd, flags: 0x%x, npages: %d\n", > bp->b_vp->v_mount->mnt_stat.f_iosize, > (intmax_t) bp->b_lblkno, > bp->b_flags, bp->b_npages); Example of a printf format error. The long was easy to print using %ld, but now there is a a u_int64_t. Using %jd gives a sign mismatch on all machines and a size mismatch on machines with sizeof(u_int64_t) != sizeof(intmax_t). > Index: bin/df/df.c > =================================================================== > RCS file: /usr/ncvs/src/bin/df/df.c,v > retrieving revision 1.51 > diff -c -r1.51 df.c > *** bin/df/df.c 13 Sep 2003 20:46:58 -0000 1.51 > --- bin/df/df.c 5 Nov 2003 19:22:11 -0000 > ... This has many more examples of printf format errors. The old code has many related bugs. Bruce