Date: Fri, 30 Jul 2010 16:24:59 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: jilles@FreeBSD.org Cc: freebsd-bugs@FreeBSD.org, andyf@speednet.com.au Subject: Re: bin/54784: find(1) -ls wastes space Message-ID: <20100730144702.M2522@delplex.bde.org> In-Reply-To: <201007292327.o6TNRHAi045973@freefall.freebsd.org> References: <201007292327.o6TNRHAi045973@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 29 Jul 2010 jilles@FreeBSD.org wrote: > Synopsis: find(1) -ls wastes space > > State-Changed-From-To: open->closed > State-Changed-By: jilles > State-Changed-When: Thu Jul 29 23:24:55 UTC 2010 > State-Changed-Why: > I don't think there is a good way to fix this. > Iterating over all users with getpwent() may be very slow or may not work > at all in some environments. > I don't see another way to find the column widths. Buffering all output > would be rather annoying as well. It certainly gives unreadable output. And didn't someone want to bloat UT_NAMESIZE to 256 or so, so that only users with 400-column terminals could read the output? Related bugs: - find(1) claims that the format is identical to that produced by ls -dgils. Actually, the formats differ greatly in whitespace -- the latter produces readable output. ls -Rdigls has the same problem as find -ls. ls has variable column widths for many of the fields including ids. I think it doesn't calculate the maximum in advance, but keeps using the previous maximum. - UT_LINESIZE and <utmp.h> no longer exist, but: - the not-unused header <sys/param.h> says that MAXLOGNAME should be UT_LINESIZE+1 refers to <utmp.h> for this. - the not-unused header <stdio.h> says that L_cuserid is UT_LINESIZE + 1 and refers to <utmp.h> for this. - there are some style bugs in these dead comments. - ut_user[] in struct utmpx has size 32 bytes (max name length 31?), but MAXLOGNAME is still 17. - find -ls still uses MAXLOGNAME, so its format hasn't been properly bloated. - there are related formatting problems for printing the inode number. find -ls uses the fixed format %6lu and a bogus cast to u_long to ensure breakage if ino_t is larger than u_long. ls uses the variable format %*lu. %6lu was more than enough when inodes were 16-bit and/or file systems were small, but hasn't been enough for many years. It messes up the alignment of the columns if there is a mixture of large and small inode numbers, and you almost might as well print ids using %s and accept the misalignment from this too. - find -ls has many other printf format errors and format-printf errors: % void % printlong(char *name, char *accpath, struct stat *sb) % { % char modep[15]; % % (void)printf("%6lu %8"PRId64" ", (u_long) sb->st_ino, sb->st_blocks); - format-printf errors: - PRI* shouldn't exist, but is used - space after cast - printf format errors: - PRI* cannot be used here (without bogusly casting st_blocks to match it). st_blocks has type blkcnt_t, which might differ from int64_t. blkcnt_t is actually still int64_t, and it would not be unreasonable to depend on this implementation to avoid using the PRI* mistake, but not to use it. % (void)strmode(sb->st_mode, modep); % (void)printf("%s %3u %-*s %-*s ", modep, sb->st_nlink, MAXLOGNAME - 1, % user_from_uid(sb->st_uid, 0), MAXLOGNAME - 1, % group_from_gid(sb->st_gid, 0)); Minor problems with fixed-width format for st_nlink (st_nlink can be as much as 32767, and when it is >= 1000 it is misformatted). Honest chumminess with the implementation for st_nlink. nlink_t happens to be uint16_t so it promotes to u_int. This still uses MAXLOGNAME. Since MAXLOGNAME is not even the maximum, misaligned comumns result if ut_user[] is actually used. Here instead of using %s format to get misalignment often or the current format to get misalignment sometimes, we should probably use a debloated non-maximum like 8 to get misalignment sometimes. % % if (S_ISCHR(sb->st_mode) || S_ISBLK(sb->st_mode)) % (void)printf("%3d, %3d ", major(sb->st_rdev), % minor(sb->st_rdev)); The fixed width of 3 for device numbers will mostly work now, but it used to cause lots of misalignment in ls output due to sparse encodings in 32-bit minor numbers. % else % (void)printf("%8"PRId64" ", sb->st_size); This use of PRI* is not even not wrong, as above. % printtime(sb->st_mtime); I fear finding too many format errors to look at the sub-functions. % (void)printf("%s", name); % if (S_ISLNK(sb->st_mode)) % printlink(accpath); % (void)putchar('\n'); % } ls doesn't use the PRI* mistake, but it uses the humanize_number() mistake and associated bogus casts and printf format errors. E.g., in printsize(), for the scientificized case it blindly casts the size (which is an off_t) to int64_t to match humanize_number()'s broken API, and in the decimal case it uses %*jd format with 2 errors (first it bogusly casts the field width (which has the bogus type size_t) to the bogus type u_int (the '*' takes an int), and then it doesn't cast the size to intmax_t to match %jd. ls uses a sub-function for printing device numbers to reduce the problems with large and negative device numbers. The find -ls output is much futher from having the same format as ls -digls in this cases. Perhaps similarly in cases involving extensions like likeACLs. Certainly similarly when ls -digls is forced to be colorized by an environment variable :-). I didn't know that find -ls existed, and would use xargs :-). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100730144702.M2522>