Skip site navigation (1)Skip section navigation (2)
Date:      Fri,  8 Sep 2000 09:57:53 -0700 (PDT)
From:      "Aaron D. Gifford" <agifford@infowest.com>
To:        <freebsd-security@freebsd.org>
Subject:   Re: How to stop problems from printf
Message-ID:  <20000908165753.80E7E37B42C@hub.freebsd.org>

next in thread | raw e-mail | index | archive | help
Okay, after the "How to stop problems from printf" discussion on
freebsd-security, in particular the example using gettext(), I
thought I'd see if there wasn't something simple that might
work.  The below code is a result.  Here's an example use:

  main(int argc, char **argv)
  {
    if(argc > 1) {
      printf(safe_fmtstr("usage: %s filename",
                         gettext("usage: %s filename"),
                         0),
             argv[0]);
      exit(0);
    }
    printf("normal execution proceeds...");
  }

I haven't done much testing yet.  There are no guarantees.

There's more commentary/documentation in the code below.

Aaron out.


/*
 *   File:      safe_fmtstr.c
 *   Version:   0.9 alpha 1
 *
 *   Written by Aaron D. Gifford
 *
 * Copyright (c) 2000 Aaron D. Gifford.  All rights reserved.
 *
 *
 * You may redistribute and use in source or binary form, with or
 * without modification provided that credit to the author(s)
 * remains intact and that the appropriate copyright, license,
 * and/or disclaimers remain intact.
 *
 * THIS SOFTWARE IS PROVIDED BY AARON D. GIFFORD ``AS IS'' AND ANY
 * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL AARON D. GIFFORD OR
 * OTHER CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
 * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
 * OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 *
 * BUG FIXES, COMMENTS, and SUGGESTIONS are always WELCOME!
 *
 *     Please send  bug fixes, suggestions, or comments to:
 *
 *     Aaron D. Gifford
 *     < m e AT a a r o n g i f f o r d DOT c o m >
 *
 *     This file should always be available from the web page
 *     at http://www.aarongifford.com/computers/ or on one of
 *     the pages linked thereon.
 *
 *
 *
 * WHAT IS IT?  This file contains implements a method of determining if
 * two different strings intended for use as format strings with the
 * printf() family of functions are identical or different with regard
 * to the format specifiers contained therein.  This is accomplished by
 * "compressing" the format specifier strings to just those characters
 * that are complete format specifiers or incomplete format specifiers.
 * The compressed versions of two format strings can thus be compared to
 * see if they are the same.
 *
 * I wrote it just to see if it is possible to make things like the below
 * snippet safer:
 *
 *   printf(gettext("Some string %s with %d formatting items."), ...);
 *
 * Using this code, you would instead do something like:
 *
 *   printf(safe_fmtstr("A format %s string %d",
 *			gettext("A format %s string %d"),
 *			MAX_FMT_LEN), ...);
 *
 * This code analyzes both passed format strings to make sure that they
 * match in format specifiers.  If they match and if the second passed
 * format string does not exceed the maximum length, it is considered "safe"
 * and will be returned by the safe_fmtstr() function.
 *
 * The function that does the two-pass format string compression or analysis
 * is compress_fmtstr().  This function removes all non-format specifier
 * characters to create a compressed representation of the format string.
 * Two such compressed strings may then be compared to see if the format
 * specifiers match.
 *
 * The other function herein, scan_fmtstr() is meant for internal use by the
 * compress_fmtstr() function.  It implements a finite state parser that does
 * the analysis of the format string.
 *
 * This code makes several assumptions that may be completely wrong or even
 * dangerous.  Please let me know where I'm wrong and/or where I may be doing
 * something foolish or insecure.  I wrote this code one night just for fun
 * to see how hard it would be.  I based my assumptions on the FreeBSD 4.1
 * PRINTF(1) man page.
 *
 * I don't know how safe this is.  It is mostly untested.  Likewise, I don't
 * know how portable it is.  Finally, no amount of analysis of a string can
 * truly guarantee that it is safe, because the analysis code is not the same
 * as the code that actually uses the analyzed parameters (in this case, the
 * printf() family of functions).  Because the does not share the exact same
 * parsing code with the printf() functions, slight differences may occur that
 * result in insecurity.
 *
 * Aaron out.
 */


/* ========== START OF HEADER ========== */


/* These two function prototypes should really be in a header file */

#include <stdlib.h>
#include <string.h>

/*
 * Name:	safe_fmtstr
 *
 * Parameters:
 *   default_fmt	The default format string
 *   pref_fmt		A preferred format string
 *   fmt_len		The maximum length permitted
 *
 * The two format strings are analyzed.  If the format
 * specifiers in both format strings match exactly and
 * if the preferred format string does not exceed the
 * maximum length permitted, it is considered safe.
 * 
 * Return values:
 *
 * The pref_fmt string is returned IF it matches exactly
 * the format specifiers found in default_fmt AND if the
 * pref_fmt does not exceed the maximum length.
 *
 * If the pref_fmt string does not match (is unsafe) or
 * if it exceeds the maximum length, the default_fmt string
 * is returned instead.
 *
 * If the fmt_len parameter is zero, the maximum length
 * permitted is infinity.
 *
 */
char *safe_fmtstr(char *defalt_fmt, char *pref_fmt, size_t fmt_len);

/*
 * Name:	compress_fmtstr
 * Parameters:
 *   fmt	A single C-style string containing a printf()
 *		type format string.
 *
 * This function analyzes the passed printf() format string and
 * returns a compressed C-style string that represents the number
 * and types of format items found in the format string.
 *
 * Return values:
 *
 * On success, a non-null C-string is returned.  The caller is
 * responsible to free this returned string (since it was allocated
 * using malloc()) when the caller is finished with it.
 *
 * On failure, a null value (char *)0 is returned instead.
 */
char *compress_fmtstr(char *fmt);


/* ========== END OF HEADER ========== */


#define DEBUG
#ifdef DEBUG
#include <assert.h>
#endif


/*
 * Assumptions this code makes:
 *
 *   1.  Any string that matches the following perl regular
 *       expression is considered a format specifier:
 *         m/%[#+ 0-]*[0-9*]*(\.[0-9*]*)?[diouXxfeEgGcs%]/
 *   2.  This code treats the '\' character as just another
 *       ordinary non-format specifier character.
 *   3.  Character sequences that partially match the above
 *       perl regular expression are invalid or incomplete
 *       format specifiers, but this code will still treat the
 *       sequence the same way valid and complete format
 *       specifiers are treated.
 *   4.  As shown in the above perl reular expression, this code
 *       assumes NO MAXIMUM length of the format specifier or the
 *       several subsections thereof.
 *
 * Please let the author know if any of these assumptions are
 * invalid or dangerous.
 *
 * This code uses a very simple finite state parsing machine to
 * count the number of format items and the types.
 *
 * A few quick definitions for the finite state parsing machine:
 *
 *   FMT_START		Exactly ONE occurences of the '%' character
 *   FMT_FLAG		ZERO or MORE of the following characters:
 *			  '#', '+', ' ', '0', or '-'
 *   FMT_WIDTHSTART	ZERO or ONE of the following characters:
 *			  digits '1' through '9', or the '*' character
 *   FMT_WIDTH		ZERO or MORE of the following characters:
 *			  digits '0' through '9', or the '*' character
 *   FMT_DOT		ZERO or ONE occurences of the '.' character.
 *			  a precision specification or end of FMT_WIDTH
 *   FMT_PREC		ZERO or MORE of the following characters:
 *			  digits '0' through '9', or the '*' character
 *   FMT_FORMAT		Exactly ONE of the following characters:
 *			  'd', 'i', 'o', 'u', 'X', 'x', 'f', 'e', 'E', 'g',
 *			  'G', 'c', 's', or '%'
 * 
 * Finite state parsing machine states:
 *
 * STATE:	MEANING:
 * MODE_CHAR	NO formatting has yet been encountered - expecting
 *		  FMT_START or more ordinary characters or escape
 *		  sequences.
 * MODE_START	FMT_START encountered - expecting FMT_FLAG | FMT_WIDTHSTART |
 *		  FMT_DOT | FMT_FORMAT 
 * MODE_FLAG	FMT_FLAG encountered - expecting FMT_WIDTHSTART | FMT_DOT |
 *		  FMT_FORMAT
 * MODE_WIDTH	FMT_WIDTH encountered - expecting FMT_WIDTH | FMT_DOT |
 *		  FMT_FORMAT
 * MODE_DOT	FMT_DOT encountered - expecting FMT_PREC | FMT_FORMAT
 * MODE_PREC	FMT_PREC encountered - expecting FMT_PREC | FMT_FORMAT
 * MODE_FORMAT	FMT_FORMAT encountered - ALL DONE - this mode really
 *		  doesn't exist since the formatting is finished at this
 *		  point and the mode will revert back to MODE_CHAR.
 */

/* State machine modes */
#define MODE_CHAR	0
#define MODE_START	1
#define MODE_FLAG	2
#define MODE_WIDTH	3
#define MODE_DOT	4
#define MODE_PREC	5

/* Some defines for inlining parsing character comparisons */
#define FMT_START_TEST \
	*ch == '%'
#define FMT_WIDTHSTART_TEST \
	     '1': \
	case '2': \
	case '3': \
	case '4': \
	case '5': \
	case '6': \
	case '7': \
	case '8': \
	case '9': \
	case '*'
#define FMT_WIDTH_TEST \
	     '0': \
	case '1': \
	case '2': \
	case '3': \
	case '4': \
	case '5': \
	case '6': \
	case '7': \
	case '8': \
	case '9': \
	case '*'
#define FMT_PREC_TEST FMT_WIDTH_TEST
#define FMT_FLAG_TEST \
	     '#': \
	case '+': \
	case '-': \
	case ' ': \
	case '0'
#define FMT_DOT_TEST '.'
#define FMT_FORMAT_TEST \
	     'd': \
	case 'i': \
	case 'o': \
	case 'u': \
	case 'X': \
	case 'x': \
	case 'f': \
	case 'e': \
	case 'E': \
	case 'g': \
	case 'G': \
	case 'c': \
	case 's': \
	case '%'

/* And some inlined action code */
#define DO_MODE_START {\
	mode = MODE_START; \
}
#define DO_MODE_FLAG {\
	if (buf != (char *)0) \
		*buf++ = *ch; \
	formatlen++; \
	mode = MODE_FLAG; \
}
#define DO_MODE_WIDTH {\
	if (buf != (char *)0) \
		*buf++ = *ch; \
	formatlen++; \
	mode = MODE_WIDTH; \
}
#define DO_MODE_DOT {\
	if (buf != (char *)0) \
		*buf++ = *ch; \
	formatlen++; \
	mode = MODE_DOT; \
}
#define DO_MODE_PREC {\
	if (buf != (char *)0) \
		*buf++ = *ch; \
	formatlen++; \
	mode = MODE_PREC; \
}
#define DO_MODE_FORMAT {\
	if (buf != (char *)0) \
		*buf++ = *ch; \
	formatlen++; \
	mode = MODE_CHAR; \
}
#define DO_INVALID_FORMAT {\
	if (buf != (char *)0) \
		*buf++ = '*'; \
	formatlen++; \
	mode = MODE_CHAR; \
}

/*
 * INTERNAL USE ONLY - This function implements the finite state
 * format scanning engine.  The arguments are the format string
 * and an optional buffer in which to write the compressed
 * format representation.  It is INTERNAL ONLY because it MUST
 * be called once with a null buffer so as to count how many
 * format items there are.  Then the second time, a buffer of
 * sufficient length can be passed to the routine.  If a person
 * were foolish enough to just pass in an arbitrary buffer, there
 * is the possibility of buffer overflow.  So DON'T DO IT!
 *
 * Returns the size of the buffer required (including space for the
 * terminating byte and in some cases, some extra "work" space).
 */
/*
 * Name:	scan_fmtstr
 * Parameters;
 *   fmt	A C-style string containing a printf()-like
 *		format string.
 *   buf	Another C-style string buffer OR NULL
 *
 * This function is intended for INTERNAL USE ONLY by the other
 * functions in this file.  It implements the finite state
 * machine.
 *
 * Return values:
 * 
 * The function returns the size of a C-style string buffer
 * required to hold a compressed format string (including a
 * byte for the C-string terminator character '\0').
 *
 * If called with a NULL buf parameter, the parser only returns
 * the expected buffer size.  If called with a NON-NULL buf
 * parameter, that non-NULL buffer will contain the compressed
 * string of exactly the size (including terminating '\0' char.)
 * that the return value indicates.
 * 
 * WARNING:
 * If the passed buffer is too small, this function will
 * overflow/overrun the buffer.  That is why it is recommended
 * that this function FIRST be called with a NULL buf parameter
 * to determine the size of buffer required, then the function
 * be called a second time (using the exact same format string)
 * with a buffer of sufficient size.
 */
size_t scan_fmtstr(char *fmt, char *buf) {
	char		*ch;
	size_t		formatlen = 0;
	size_t		maxlen = 0;
	int		mode = MODE_CHAR;
	

	/* Shortcut if the format string is null */
	if (fmt == (char *)0) {
		if (buf != (char *)0)
			*buf = (char)0;
		return 0;
	}

	/*
	 * Finite state machine.  If the passed buffer variable buf
	 * is null (buf == (char *)0) then we just count the format
	 * specifications.  If it is non-null, we assume that a
	 * count has already been done on this format string in the
	 * past and a sufficient C string has been allocated for the
	 * buffer so we can safely copy format items into it.
	 */
	for (ch = fmt; *ch; ch++) {
		switch (mode) {
			case MODE_CHAR:
				if (FMT_START_TEST)
					DO_MODE_START;
				break;
			case MODE_START:
				switch (*ch) {
					case FMT_FLAG_TEST:
						DO_MODE_FLAG;
						break;

					case FMT_WIDTHSTART_TEST:
						DO_MODE_WIDTH;
						break;
					
					case FMT_DOT_TEST:
						DO_MODE_DOT;
						break;

					case FMT_FORMAT_TEST:
						DO_MODE_FORMAT;
						break;

					/* Invalid Format */
					default:
						DO_INVALID_FORMAT;
				}
				break;
			case MODE_FLAG:
				switch (*ch) {
					case FMT_WIDTHSTART_TEST:
						DO_MODE_WIDTH;
						break;
					
					case FMT_DOT_TEST:
						DO_MODE_DOT;
						break;

					case FMT_FORMAT_TEST:
						DO_MODE_FORMAT;
						break;

					/* Invalid Format */
					default:
						DO_INVALID_FORMAT;
				}
				break;
			case MODE_WIDTH:
				switch (*ch) {
					case FMT_WIDTH_TEST:
						DO_MODE_WIDTH;
						break;
					
					case FMT_DOT_TEST:
						DO_MODE_DOT;
						break;

					case FMT_FORMAT_TEST:
						DO_MODE_FORMAT;
						break;

					/* Invalid Format */
					default:
						DO_INVALID_FORMAT;
				}
				break;
			case MODE_DOT:
				switch (*ch) {
					case FMT_PREC_TEST:
						DO_MODE_PREC;
						break;

					case FMT_FORMAT_TEST:
						DO_MODE_FORMAT;
						break;

					/* Invalid Format */
					default:
						DO_INVALID_FORMAT;
				}
				break;
			case MODE_PREC:
				switch (*ch) {
					case FMT_PREC_TEST:
						DO_MODE_PREC;
						break;

					case FMT_FORMAT_TEST:
						DO_MODE_FORMAT;
						break;

					/* Invalid Format */
					default:
						DO_INVALID_FORMAT;
				}
				break;
			default:
				/* This should NEVER happen! */
#ifdef DEBUG
				assert(0);
#endif
		}
	}
	if (mode != MODE_CHAR)
		DO_INVALID_FORMAT;

	if (buf != (char *)0)
		/* Terminate the buffer with the C '\0' terminator character */
		*buf = (char)0;

	/*
	 * Return the size of buffer required (with space for the
	 * terminating byte)
	 */
	return ((maxlen > formatlen) ? maxlen : formatlen) + 1;
}

char *compress_fmtstr(char *fmt) {
	size_t	len = scan_fmtstr(fmt, (char *)0);
	char	*buf = (char *)malloc(len * sizeof(char));

	if (buf == (char *)0) {
		/* malloc() must have failed - return NULL */
		return buf;
	}
	scan_fmtstr(fmt, buf);
	return buf;
}

char *safe_fmtstr(char *default_fmt, char *pref_fmt, size_t fmt_len) {
	char	*a, *b;

	/* If the preferred format length exceeds the limit, return the default */
	if (fmt_len > 0 && pref_fmt != (char *)0 && strlen(pref_fmt) > fmt_len)
		return default_fmt;

	a = compress_fmtstr(default_fmt);
	if (a == (char *)0) {
		/* malloc() must have failed - default to original format */
		return default_fmt;
	}

	b = compress_fmtstr(pref_fmt);
	if (b == (char *)0) {
		/* malloc() must have failed - default to original format */
		free(a);
		return default_fmt;
	}
	if (strcmp(a, b)) {
		/*
		 * The pref_fmt string does NOT match the default_fmt
		 * string's format specifiers, and so is considered
		 * UNSAFE.  Return the default.
		 */
		free(a);
		free(b);
		return default_fmt;
	}
	/*
	 * The pref_fmt string does MATCH the default_fmt
	 * string's format specifiers exactly, so it is considered
	 * SAFE.  Return the preferred format.
	 */
	free(a);
	free(b);
	return pref_fmt;
}

#if 0
/* FOR TESTING ONLY: */

#include <stdio.h>

int main(int argc, char *argv[]) {
	char	**a = argv;

	if (*a)
		a++;
	while (*a && **a) {
		/*
		 * I'm deliberately not free()ing the malloc()ated strings that
		 * the compress_fmtstr() function returns because this test code
		 * can leak memory all it wants to.  ;)
		 */
		printf("FORMAT STRING '%s' compresses to '%s'\n",*a,compress_fmtstr(*a));
		if (*(a+1))
			printf("safe_fmtstr('%s', '%s', 0) = '%s'\n", *a, *(a+1), safe_fmtstr(*a, *(a+1), 0));
		a++;
	}
	return 0;
}

#endif


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-security" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000908165753.80E7E37B42C>