Date: Fri, 3 Dec 2004 11:10:28 GMT From: Bruce Evans <bde@zeta.org.au> To: freebsd-bugs@FreeBSD.org Subject: Re: bin/74567: [patch] [2TB] du doesn't handle sizes >1TB Message-ID: <200412031110.iB3BASfZ047144@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/74567; it has been noted by GNATS. From: Bruce Evans <bde@zeta.org.au> To: Sergey Salnikov <serg@www1.citforum.ru> Cc: freebsd-gnats-submit@freebsd.org Subject: Re: bin/74567: [patch] [2TB] du doesn't handle sizes >1TB Date: Fri, 3 Dec 2004 22:04:25 +1100 (EST) On Thu, 2 Dec 2004, Sergey Salnikov wrote: > >It seems like off_t would be more appropriate. > Really I simply looked at the "prthumanval(int64_t bytes)" line and > thought that using the same typedef everywhere would be the right > thing. But off_t is better for that, you're right. No, off_t is worse for that. off_t is a signed integer integer type suitable for representing file offsets. It must be signed so that it can represent negative offsets and have a range twice as large as is needed for file sizes. Using it to represent file sizes would be bogus since file sizes aren't offsets but would always work since off_t must be able to represent the offset from the beginning of a file to the end of the same file. Using it to represent sums of file sizes sizs would be more bogus since sums may be much larger than their terms; in particular, it would break on systems where off_t is 32 bits but file systems are larger than 2^31-1 bytes. Using it in du would be even more bogus since du needs to represent sums of block counts, not sums of file sizes. POSIX.1-2001 requires the existence of a signed integer type blkcnt_t which is "Used for file block counts". It must be signed for backwards compatibility. This is still not implemented in FreeBSD. It is fuzzily specified in POSIX. The spec doesn't require it to be usable for block counts, but it requires st_blocks in struct stat to have type blkcnt_t so blkcnt_t needs to be able to represent all file block counts for stat() to actually work. It is fairly clear that it is not intended to to work for more than single files. POSIX.1-2001 requires the existence of an unsigned integer type fsblkcnt_t which is "Used for file system block counts". This is implemented in FreeBSD but not used in the main place where something like it is needed (statfs()). It must be unsigned for backwards compatibility with SYSV, but this makes it unsuitable for use in FreeBSD since file system block counts can be negative (for f_bavail). statvfs() is unsuitable for the same reason. Some BSDs have deprecated statfs() in favour of statvfs() despite the latter's unsuitability, but statfs() is still the primary interface that reports block counts in FreeBSD. statfs() avoids some bugs by not using fsblkcnt_t, but has its own sign extension bugs from excessive use of unsigned types. Using fsblkcnt_t in du would be bogus since it is not intended to work for more than single file systems. Using a type related to block counts for fts_number would be bogus because fts_number needs to represent generic values. fts_number needs to have a signed arithmetic type since generic values may be negative. Using long for it was almost correct until the C standard broke the promise that long is the largest integer type. Using double would have been better, at least on machines with IEEE floating point so that exact representability of all integers between about -2^53 and 2^53 is guaranteed. Now that the C standard promises that intmax_t is the largest signed integer type, intmax_t is almost the correct type for fts_number. Using double would still be safer since it has a much larger range (though less precision for large values). intmax_t goes up to about 2^63 which should be enough for anyone, but 2^53 should also be enough for anyone. The reason to use doubles is that bugs may cause preposterously large values (> 2^63) and floating point won't hide the bugs by truncating mod 2^63. In theory, overflow can be detected for signed integer arithmetic, but no one ever turns on the overflow detection and excessive use of unsigned types defeats it. > A new patch follows. > ... > > --- du-2TB-patch-v2 begins here --- > --- du.c.orig Wed Jul 28 20:03:12 2004 > +++ du.c Thu Dec 2 00:54:34 2004 > @@ -72,7 +72,7 @@ > > static int linkchk(FTSENT *); > static void usage(void); > -void prthumanval(int64_t); > +void prthumanval(off_t); Using int64_t in prthumanval() is as correct as possible, since prthumanva() is just a wrapper for humanize_number() and humanize_number() takes an int64_t due to previous poor choice of types. > [... style bugs deleted ...] > - (void) printf("%ld\t%s\n", > - howmany(p->fts_number, blocksize), > + (void) printf("%lld\t%s\n", > + howmany(*(off_t *)p->fts_pointer, > blocksize), > p->fts_path); %lld is for printing long long's. Using of for printing foo_t's is logically wrong and potentially gives runtime errors. This bug is detected at compile time on FreeBSD's 64-bit arches because both int64_t and off_t are plain long (!= long long) on these arches. Another reason to always replace long by intmax_t in changes of this type is that by getting it right the first time you don't have to change lots of printfs several times. FreeBSD already has too many %qd's and %lld's from intermediate changes before intmax_t became standard. In many cases, compile-time detection of the type being printed being different from quad_t or long long, respectively, is defeated by casting the arg to match the format. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200412031110.iB3BASfZ047144>