From owner-svn-src-head@freebsd.org Thu Dec 1 00:19:23 2016 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9DDFC5D8D6; Thu, 1 Dec 2016 00:19:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 617EC13C1; Thu, 1 Dec 2016 00:19:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c110-21-100-78.carlnfd1.nsw.optusnet.com.au [110.21.100.78]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 60B4410420A1; Thu, 1 Dec 2016 11:19:20 +1100 (AEDT) Date: Thu, 1 Dec 2016 11:19:19 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Eric van Gyzen cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r309331 - head/usr.bin/locale In-Reply-To: <201611301834.uAUIYfQs075427@repo.freebsd.org> Message-ID: <20161201082459.T1285@besplex.bde.org> References: <201611301834.uAUIYfQs075427@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=BKLDlBYG c=1 sm=1 tr=0 a=uGjuzT6u7JdBDS7kH8taPg==:117 a=uGjuzT6u7JdBDS7kH8taPg==:17 a=kj9zAlcOel0A:10 a=jWiCfzBZR5YcepJ51wAA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Dec 2016 00:19:24 -0000 On Wed, 30 Nov 2016, Eric van Gyzen wrote: > Log: > Include limits.h for CHAR_MAX > > This was needed on stable/10. Apparently, sys/param.h supplies CHAR_MAX > on head. Include limits.h anyway, for consistency, and because C says so. sys/param.actually supplies CHAR_MAX (undocumented namespace pollution) in all versions of FreeBSD. The function that uses this has many bugs, but most are only style bugs due to various magic. X char * X format_grouping(const char *binary) X { X static char rval[64]; X const char *cp; X size_t len; X X rval[0] = '\0'; X for (cp = binary; *cp != '\0'; ++cp) { X char group[sizeof("127;")]; This hard-codes CHAR_MAX is 127. This works accidentally on POSIX >= 2100 since POSIX now specifies 8-bit chars, and even if chars are unsigned so that CHAR_MAX is 255, that fits in the same space as 127. X snprintf(group, sizeof(group), "%hhd;", *cp); X len = strlcat(rval, group, sizeof(rval)); X if (len >= sizeof(rval)) { X len = sizeof(rval) - 1; X break; X } I don't like the error handling. It is too careful, yet not careful enough to be correct. Use of snprintf() with no error handling instead of sprintf() has no effect unless there are bugs, but there are bugs. The first few are: - if chars are signed, then *cp may be negative. This is an invalid format, but we should check it. Negative chars are converted to garbage starting with a minus sign. group[] is too small to hold large ones. We defend against buffer overruns by using snprintf(), but don't check for errors. Truncation gives further garbage. - %hhd is a bogus format if chars are signed. Then hh in it has no effect - %hhd is a broken format if chars are unsigned. It says to convert the arg to signed char. On non-exotic machines with CHAR_MAX = 255, this corrupts values between 128 and 255 to negative ones. Large values are unlikely/physically impossible, but CHAR_MAX is a magic sentinel value which I think we want to print as itself (users also need to know what CHAR_MAX is to decode it). - on exotic machines, CHAR_MAX can be UINT_MAX. Then char promotes to u_int and %d format would give undefined behaviour if the value exceeds INT_MAX. %hhd is no better for avoiding the undefined behviour, and when it works normally it corrupts large values as for the non-exotic case. X if (*cp == CHAR_MAX) { X break; X } X } X X /* Remove the trailing ';'. */ X rval[len - 1] = '\0'; This writes outside of rval when binary is the null string. Possible for at least invalid formats. len should be initialized to 0 in this case, but it actually uninitialized, so compilers will warn at high WARNS even if this case is unreachable. X X return (rval); X } Untested fixes and cleanups: Y diff -u2 locale.c~ locale.c Y --- locale.c~ 2016-11-25 08:26:48.000000000 +0000 Y +++ locale.c 2016-12-01 00:09:23.603179000 +0000 Y @@ -495,27 +495,27 @@ Y static char rval[64]; Y const char *cp; Y - size_t len; Y + size_t roff; Y + int len; Y Y rval[0] = '\0'; Y + roff = 0; Y for (cp = binary; *cp != '\0'; ++cp) { Y - char group[sizeof("127;")]; Y - snprintf(group, sizeof(group), "%hhd;", *cp); Y - len = strlcat(rval, group, sizeof(rval)); Y - if (len >= sizeof(rval)) { Y - len = sizeof(rval) - 1; Y - break; Y - } Y - if (*cp == CHAR_MAX) { Y - break; Y - } Y + if (*cp < 0) Y + break; /* garbage input */ Y + len = snprintf(&rval[roff], sizeof(rval) - roff, "%u;", *cp); Y + if (len < 0 || len >= sizeof(rval) - roff) Y + break; /* insufficient space for output */ Y + roff += len; Y + if (*cp == CHAR_MAX) Y + break; /* special termination */ Y } Y Y - /* Remove the trailing ';'. */ Y - rval[len - 1] = '\0'; Y + /* Truncate at the last successfully snprintf()ed semicolon. */ Y + if (roff != 0) Y + rval[roff - 1] = '\0'; Y Y - return (rval); Y + return (&rval[0]); Y } Y Y - Y /* Y * keyword value lookup helper (via localeconv()) It was much easier to just use snprintf(). The error handling can be further simplified by checking if rval[] is large enough up front. It needs to have size strlen("255;") * strlen(binary) + 1, where 255 is POSIX CHAR_MAX hard-coded. Bruce