Date: Fri, 8 Sep 2000 09:57:53 -0700 (PDT) From: "Aaron D. Gifford" <agifford@infowest.com> To: <freebsd-security@freebsd.org> Subject: Re: How to stop problems from printf Message-ID: <20000908165753.80E7E37B42C@hub.freebsd.org>
next in thread | raw e-mail | index | archive | help
Okay, after the "How to stop problems from printf" discussion on
freebsd-security, in particular the example using gettext(), I
thought I'd see if there wasn't something simple that might
work. The below code is a result. Here's an example use:
main(int argc, char **argv)
{
if(argc > 1) {
printf(safe_fmtstr("usage: %s filename",
gettext("usage: %s filename"),
0),
argv[0]);
exit(0);
}
printf("normal execution proceeds...");
}
I haven't done much testing yet. There are no guarantees.
There's more commentary/documentation in the code below.
Aaron out.
/*
* File: safe_fmtstr.c
* Version: 0.9 alpha 1
*
* Written by Aaron D. Gifford
*
* Copyright (c) 2000 Aaron D. Gifford. All rights reserved.
*
*
* You may redistribute and use in source or binary form, with or
* without modification provided that credit to the author(s)
* remains intact and that the appropriate copyright, license,
* and/or disclaimers remain intact.
*
* THIS SOFTWARE IS PROVIDED BY AARON D. GIFFORD ``AS IS'' AND ANY
* EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL AARON D. GIFFORD OR
* OTHER CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*
*
* BUG FIXES, COMMENTS, and SUGGESTIONS are always WELCOME!
*
* Please send bug fixes, suggestions, or comments to:
*
* Aaron D. Gifford
* < m e AT a a r o n g i f f o r d DOT c o m >
*
* This file should always be available from the web page
* at http://www.aarongifford.com/computers/ or on one of
* the pages linked thereon.
*
*
*
* WHAT IS IT? This file contains implements a method of determining if
* two different strings intended for use as format strings with the
* printf() family of functions are identical or different with regard
* to the format specifiers contained therein. This is accomplished by
* "compressing" the format specifier strings to just those characters
* that are complete format specifiers or incomplete format specifiers.
* The compressed versions of two format strings can thus be compared to
* see if they are the same.
*
* I wrote it just to see if it is possible to make things like the below
* snippet safer:
*
* printf(gettext("Some string %s with %d formatting items."), ...);
*
* Using this code, you would instead do something like:
*
* printf(safe_fmtstr("A format %s string %d",
* gettext("A format %s string %d"),
* MAX_FMT_LEN), ...);
*
* This code analyzes both passed format strings to make sure that they
* match in format specifiers. If they match and if the second passed
* format string does not exceed the maximum length, it is considered "safe"
* and will be returned by the safe_fmtstr() function.
*
* The function that does the two-pass format string compression or analysis
* is compress_fmtstr(). This function removes all non-format specifier
* characters to create a compressed representation of the format string.
* Two such compressed strings may then be compared to see if the format
* specifiers match.
*
* The other function herein, scan_fmtstr() is meant for internal use by the
* compress_fmtstr() function. It implements a finite state parser that does
* the analysis of the format string.
*
* This code makes several assumptions that may be completely wrong or even
* dangerous. Please let me know where I'm wrong and/or where I may be doing
* something foolish or insecure. I wrote this code one night just for fun
* to see how hard it would be. I based my assumptions on the FreeBSD 4.1
* PRINTF(1) man page.
*
* I don't know how safe this is. It is mostly untested. Likewise, I don't
* know how portable it is. Finally, no amount of analysis of a string can
* truly guarantee that it is safe, because the analysis code is not the same
* as the code that actually uses the analyzed parameters (in this case, the
* printf() family of functions). Because the does not share the exact same
* parsing code with the printf() functions, slight differences may occur that
* result in insecurity.
*
* Aaron out.
*/
/* ========== START OF HEADER ========== */
/* These two function prototypes should really be in a header file */
#include <stdlib.h>
#include <string.h>
/*
* Name: safe_fmtstr
*
* Parameters:
* default_fmt The default format string
* pref_fmt A preferred format string
* fmt_len The maximum length permitted
*
* The two format strings are analyzed. If the format
* specifiers in both format strings match exactly and
* if the preferred format string does not exceed the
* maximum length permitted, it is considered safe.
*
* Return values:
*
* The pref_fmt string is returned IF it matches exactly
* the format specifiers found in default_fmt AND if the
* pref_fmt does not exceed the maximum length.
*
* If the pref_fmt string does not match (is unsafe) or
* if it exceeds the maximum length, the default_fmt string
* is returned instead.
*
* If the fmt_len parameter is zero, the maximum length
* permitted is infinity.
*
*/
char *safe_fmtstr(char *defalt_fmt, char *pref_fmt, size_t fmt_len);
/*
* Name: compress_fmtstr
* Parameters:
* fmt A single C-style string containing a printf()
* type format string.
*
* This function analyzes the passed printf() format string and
* returns a compressed C-style string that represents the number
* and types of format items found in the format string.
*
* Return values:
*
* On success, a non-null C-string is returned. The caller is
* responsible to free this returned string (since it was allocated
* using malloc()) when the caller is finished with it.
*
* On failure, a null value (char *)0 is returned instead.
*/
char *compress_fmtstr(char *fmt);
/* ========== END OF HEADER ========== */
#define DEBUG
#ifdef DEBUG
#include <assert.h>
#endif
/*
* Assumptions this code makes:
*
* 1. Any string that matches the following perl regular
* expression is considered a format specifier:
* m/%[#+ 0-]*[0-9*]*(\.[0-9*]*)?[diouXxfeEgGcs%]/
* 2. This code treats the '\' character as just another
* ordinary non-format specifier character.
* 3. Character sequences that partially match the above
* perl regular expression are invalid or incomplete
* format specifiers, but this code will still treat the
* sequence the same way valid and complete format
* specifiers are treated.
* 4. As shown in the above perl reular expression, this code
* assumes NO MAXIMUM length of the format specifier or the
* several subsections thereof.
*
* Please let the author know if any of these assumptions are
* invalid or dangerous.
*
* This code uses a very simple finite state parsing machine to
* count the number of format items and the types.
*
* A few quick definitions for the finite state parsing machine:
*
* FMT_START Exactly ONE occurences of the '%' character
* FMT_FLAG ZERO or MORE of the following characters:
* '#', '+', ' ', '0', or '-'
* FMT_WIDTHSTART ZERO or ONE of the following characters:
* digits '1' through '9', or the '*' character
* FMT_WIDTH ZERO or MORE of the following characters:
* digits '0' through '9', or the '*' character
* FMT_DOT ZERO or ONE occurences of the '.' character.
* a precision specification or end of FMT_WIDTH
* FMT_PREC ZERO or MORE of the following characters:
* digits '0' through '9', or the '*' character
* FMT_FORMAT Exactly ONE of the following characters:
* 'd', 'i', 'o', 'u', 'X', 'x', 'f', 'e', 'E', 'g',
* 'G', 'c', 's', or '%'
*
* Finite state parsing machine states:
*
* STATE: MEANING:
* MODE_CHAR NO formatting has yet been encountered - expecting
* FMT_START or more ordinary characters or escape
* sequences.
* MODE_START FMT_START encountered - expecting FMT_FLAG | FMT_WIDTHSTART |
* FMT_DOT | FMT_FORMAT
* MODE_FLAG FMT_FLAG encountered - expecting FMT_WIDTHSTART | FMT_DOT |
* FMT_FORMAT
* MODE_WIDTH FMT_WIDTH encountered - expecting FMT_WIDTH | FMT_DOT |
* FMT_FORMAT
* MODE_DOT FMT_DOT encountered - expecting FMT_PREC | FMT_FORMAT
* MODE_PREC FMT_PREC encountered - expecting FMT_PREC | FMT_FORMAT
* MODE_FORMAT FMT_FORMAT encountered - ALL DONE - this mode really
* doesn't exist since the formatting is finished at this
* point and the mode will revert back to MODE_CHAR.
*/
/* State machine modes */
#define MODE_CHAR 0
#define MODE_START 1
#define MODE_FLAG 2
#define MODE_WIDTH 3
#define MODE_DOT 4
#define MODE_PREC 5
/* Some defines for inlining parsing character comparisons */
#define FMT_START_TEST \
*ch == '%'
#define FMT_WIDTHSTART_TEST \
'1': \
case '2': \
case '3': \
case '4': \
case '5': \
case '6': \
case '7': \
case '8': \
case '9': \
case '*'
#define FMT_WIDTH_TEST \
'0': \
case '1': \
case '2': \
case '3': \
case '4': \
case '5': \
case '6': \
case '7': \
case '8': \
case '9': \
case '*'
#define FMT_PREC_TEST FMT_WIDTH_TEST
#define FMT_FLAG_TEST \
'#': \
case '+': \
case '-': \
case ' ': \
case '0'
#define FMT_DOT_TEST '.'
#define FMT_FORMAT_TEST \
'd': \
case 'i': \
case 'o': \
case 'u': \
case 'X': \
case 'x': \
case 'f': \
case 'e': \
case 'E': \
case 'g': \
case 'G': \
case 'c': \
case 's': \
case '%'
/* And some inlined action code */
#define DO_MODE_START {\
mode = MODE_START; \
}
#define DO_MODE_FLAG {\
if (buf != (char *)0) \
*buf++ = *ch; \
formatlen++; \
mode = MODE_FLAG; \
}
#define DO_MODE_WIDTH {\
if (buf != (char *)0) \
*buf++ = *ch; \
formatlen++; \
mode = MODE_WIDTH; \
}
#define DO_MODE_DOT {\
if (buf != (char *)0) \
*buf++ = *ch; \
formatlen++; \
mode = MODE_DOT; \
}
#define DO_MODE_PREC {\
if (buf != (char *)0) \
*buf++ = *ch; \
formatlen++; \
mode = MODE_PREC; \
}
#define DO_MODE_FORMAT {\
if (buf != (char *)0) \
*buf++ = *ch; \
formatlen++; \
mode = MODE_CHAR; \
}
#define DO_INVALID_FORMAT {\
if (buf != (char *)0) \
*buf++ = '*'; \
formatlen++; \
mode = MODE_CHAR; \
}
/*
* INTERNAL USE ONLY - This function implements the finite state
* format scanning engine. The arguments are the format string
* and an optional buffer in which to write the compressed
* format representation. It is INTERNAL ONLY because it MUST
* be called once with a null buffer so as to count how many
* format items there are. Then the second time, a buffer of
* sufficient length can be passed to the routine. If a person
* were foolish enough to just pass in an arbitrary buffer, there
* is the possibility of buffer overflow. So DON'T DO IT!
*
* Returns the size of the buffer required (including space for the
* terminating byte and in some cases, some extra "work" space).
*/
/*
* Name: scan_fmtstr
* Parameters;
* fmt A C-style string containing a printf()-like
* format string.
* buf Another C-style string buffer OR NULL
*
* This function is intended for INTERNAL USE ONLY by the other
* functions in this file. It implements the finite state
* machine.
*
* Return values:
*
* The function returns the size of a C-style string buffer
* required to hold a compressed format string (including a
* byte for the C-string terminator character '\0').
*
* If called with a NULL buf parameter, the parser only returns
* the expected buffer size. If called with a NON-NULL buf
* parameter, that non-NULL buffer will contain the compressed
* string of exactly the size (including terminating '\0' char.)
* that the return value indicates.
*
* WARNING:
* If the passed buffer is too small, this function will
* overflow/overrun the buffer. That is why it is recommended
* that this function FIRST be called with a NULL buf parameter
* to determine the size of buffer required, then the function
* be called a second time (using the exact same format string)
* with a buffer of sufficient size.
*/
size_t scan_fmtstr(char *fmt, char *buf) {
char *ch;
size_t formatlen = 0;
size_t maxlen = 0;
int mode = MODE_CHAR;
/* Shortcut if the format string is null */
if (fmt == (char *)0) {
if (buf != (char *)0)
*buf = (char)0;
return 0;
}
/*
* Finite state machine. If the passed buffer variable buf
* is null (buf == (char *)0) then we just count the format
* specifications. If it is non-null, we assume that a
* count has already been done on this format string in the
* past and a sufficient C string has been allocated for the
* buffer so we can safely copy format items into it.
*/
for (ch = fmt; *ch; ch++) {
switch (mode) {
case MODE_CHAR:
if (FMT_START_TEST)
DO_MODE_START;
break;
case MODE_START:
switch (*ch) {
case FMT_FLAG_TEST:
DO_MODE_FLAG;
break;
case FMT_WIDTHSTART_TEST:
DO_MODE_WIDTH;
break;
case FMT_DOT_TEST:
DO_MODE_DOT;
break;
case FMT_FORMAT_TEST:
DO_MODE_FORMAT;
break;
/* Invalid Format */
default:
DO_INVALID_FORMAT;
}
break;
case MODE_FLAG:
switch (*ch) {
case FMT_WIDTHSTART_TEST:
DO_MODE_WIDTH;
break;
case FMT_DOT_TEST:
DO_MODE_DOT;
break;
case FMT_FORMAT_TEST:
DO_MODE_FORMAT;
break;
/* Invalid Format */
default:
DO_INVALID_FORMAT;
}
break;
case MODE_WIDTH:
switch (*ch) {
case FMT_WIDTH_TEST:
DO_MODE_WIDTH;
break;
case FMT_DOT_TEST:
DO_MODE_DOT;
break;
case FMT_FORMAT_TEST:
DO_MODE_FORMAT;
break;
/* Invalid Format */
default:
DO_INVALID_FORMAT;
}
break;
case MODE_DOT:
switch (*ch) {
case FMT_PREC_TEST:
DO_MODE_PREC;
break;
case FMT_FORMAT_TEST:
DO_MODE_FORMAT;
break;
/* Invalid Format */
default:
DO_INVALID_FORMAT;
}
break;
case MODE_PREC:
switch (*ch) {
case FMT_PREC_TEST:
DO_MODE_PREC;
break;
case FMT_FORMAT_TEST:
DO_MODE_FORMAT;
break;
/* Invalid Format */
default:
DO_INVALID_FORMAT;
}
break;
default:
/* This should NEVER happen! */
#ifdef DEBUG
assert(0);
#endif
}
}
if (mode != MODE_CHAR)
DO_INVALID_FORMAT;
if (buf != (char *)0)
/* Terminate the buffer with the C '\0' terminator character */
*buf = (char)0;
/*
* Return the size of buffer required (with space for the
* terminating byte)
*/
return ((maxlen > formatlen) ? maxlen : formatlen) + 1;
}
char *compress_fmtstr(char *fmt) {
size_t len = scan_fmtstr(fmt, (char *)0);
char *buf = (char *)malloc(len * sizeof(char));
if (buf == (char *)0) {
/* malloc() must have failed - return NULL */
return buf;
}
scan_fmtstr(fmt, buf);
return buf;
}
char *safe_fmtstr(char *default_fmt, char *pref_fmt, size_t fmt_len) {
char *a, *b;
/* If the preferred format length exceeds the limit, return the default */
if (fmt_len > 0 && pref_fmt != (char *)0 && strlen(pref_fmt) > fmt_len)
return default_fmt;
a = compress_fmtstr(default_fmt);
if (a == (char *)0) {
/* malloc() must have failed - default to original format */
return default_fmt;
}
b = compress_fmtstr(pref_fmt);
if (b == (char *)0) {
/* malloc() must have failed - default to original format */
free(a);
return default_fmt;
}
if (strcmp(a, b)) {
/*
* The pref_fmt string does NOT match the default_fmt
* string's format specifiers, and so is considered
* UNSAFE. Return the default.
*/
free(a);
free(b);
return default_fmt;
}
/*
* The pref_fmt string does MATCH the default_fmt
* string's format specifiers exactly, so it is considered
* SAFE. Return the preferred format.
*/
free(a);
free(b);
return pref_fmt;
}
#if 0
/* FOR TESTING ONLY: */
#include <stdio.h>
int main(int argc, char *argv[]) {
char **a = argv;
if (*a)
a++;
while (*a && **a) {
/*
* I'm deliberately not free()ing the malloc()ated strings that
* the compress_fmtstr() function returns because this test code
* can leak memory all it wants to. ;)
*/
printf("FORMAT STRING '%s' compresses to '%s'\n",*a,compress_fmtstr(*a));
if (*(a+1))
printf("safe_fmtstr('%s', '%s', 0) = '%s'\n", *a, *(a+1), safe_fmtstr(*a, *(a+1), 0));
a++;
}
return 0;
}
#endif
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-security" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000908165753.80E7E37B42C>
