From owner-freebsd-security Fri Sep 8 9:58:12 2000 Delivered-To: freebsd-security@freebsd.org Received: from jardan.infowest.com (jardan.infowest.com [216.190.28.251]) by hub.freebsd.org (Postfix) with SMTP id 80E7E37B42C for ; Fri, 8 Sep 2000 09:57:53 -0700 (PDT) From: "Aaron D. Gifford" To: Subject: Re: How to stop problems from printf Message-Id: <20000908165753.80E7E37B42C@hub.freebsd.org> Date: Fri, 8 Sep 2000 09:57:53 -0700 (PDT) Sender: owner-freebsd-security@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Okay, after the "How to stop problems from printf" discussion on freebsd-security, in particular the example using gettext(), I thought I'd see if there wasn't something simple that might work. The below code is a result. Here's an example use: main(int argc, char **argv) { if(argc > 1) { printf(safe_fmtstr("usage: %s filename", gettext("usage: %s filename"), 0), argv[0]); exit(0); } printf("normal execution proceeds..."); } I haven't done much testing yet. There are no guarantees. There's more commentary/documentation in the code below. Aaron out. /* * File: safe_fmtstr.c * Version: 0.9 alpha 1 * * Written by Aaron D. Gifford * * Copyright (c) 2000 Aaron D. Gifford. All rights reserved. * * * You may redistribute and use in source or binary form, with or * without modification provided that credit to the author(s) * remains intact and that the appropriate copyright, license, * and/or disclaimers remain intact. * * THIS SOFTWARE IS PROVIDED BY AARON D. GIFFORD ``AS IS'' AND ANY * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL AARON D. GIFFORD OR * OTHER CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED * OF THE POSSIBILITY OF SUCH DAMAGE. * * * BUG FIXES, COMMENTS, and SUGGESTIONS are always WELCOME! * * Please send bug fixes, suggestions, or comments to: * * Aaron D. Gifford * < m e AT a a r o n g i f f o r d DOT c o m > * * This file should always be available from the web page * at http://www.aarongifford.com/computers/ or on one of * the pages linked thereon. * * * * WHAT IS IT? This file contains implements a method of determining if * two different strings intended for use as format strings with the * printf() family of functions are identical or different with regard * to the format specifiers contained therein. This is accomplished by * "compressing" the format specifier strings to just those characters * that are complete format specifiers or incomplete format specifiers. * The compressed versions of two format strings can thus be compared to * see if they are the same. * * I wrote it just to see if it is possible to make things like the below * snippet safer: * * printf(gettext("Some string %s with %d formatting items."), ...); * * Using this code, you would instead do something like: * * printf(safe_fmtstr("A format %s string %d", * gettext("A format %s string %d"), * MAX_FMT_LEN), ...); * * This code analyzes both passed format strings to make sure that they * match in format specifiers. If they match and if the second passed * format string does not exceed the maximum length, it is considered "safe" * and will be returned by the safe_fmtstr() function. * * The function that does the two-pass format string compression or analysis * is compress_fmtstr(). This function removes all non-format specifier * characters to create a compressed representation of the format string. * Two such compressed strings may then be compared to see if the format * specifiers match. * * The other function herein, scan_fmtstr() is meant for internal use by the * compress_fmtstr() function. It implements a finite state parser that does * the analysis of the format string. * * This code makes several assumptions that may be completely wrong or even * dangerous. Please let me know where I'm wrong and/or where I may be doing * something foolish or insecure. I wrote this code one night just for fun * to see how hard it would be. I based my assumptions on the FreeBSD 4.1 * PRINTF(1) man page. * * I don't know how safe this is. It is mostly untested. Likewise, I don't * know how portable it is. Finally, no amount of analysis of a string can * truly guarantee that it is safe, because the analysis code is not the same * as the code that actually uses the analyzed parameters (in this case, the * printf() family of functions). Because the does not share the exact same * parsing code with the printf() functions, slight differences may occur that * result in insecurity. * * Aaron out. */ /* ========== START OF HEADER ========== */ /* These two function prototypes should really be in a header file */ #include #include /* * Name: safe_fmtstr * * Parameters: * default_fmt The default format string * pref_fmt A preferred format string * fmt_len The maximum length permitted * * The two format strings are analyzed. If the format * specifiers in both format strings match exactly and * if the preferred format string does not exceed the * maximum length permitted, it is considered safe. * * Return values: * * The pref_fmt string is returned IF it matches exactly * the format specifiers found in default_fmt AND if the * pref_fmt does not exceed the maximum length. * * If the pref_fmt string does not match (is unsafe) or * if it exceeds the maximum length, the default_fmt string * is returned instead. * * If the fmt_len parameter is zero, the maximum length * permitted is infinity. * */ char *safe_fmtstr(char *defalt_fmt, char *pref_fmt, size_t fmt_len); /* * Name: compress_fmtstr * Parameters: * fmt A single C-style string containing a printf() * type format string. * * This function analyzes the passed printf() format string and * returns a compressed C-style string that represents the number * and types of format items found in the format string. * * Return values: * * On success, a non-null C-string is returned. The caller is * responsible to free this returned string (since it was allocated * using malloc()) when the caller is finished with it. * * On failure, a null value (char *)0 is returned instead. */ char *compress_fmtstr(char *fmt); /* ========== END OF HEADER ========== */ #define DEBUG #ifdef DEBUG #include #endif /* * Assumptions this code makes: * * 1. Any string that matches the following perl regular * expression is considered a format specifier: * m/%[#+ 0-]*[0-9*]*(\.[0-9*]*)?[diouXxfeEgGcs%]/ * 2. This code treats the '\' character as just another * ordinary non-format specifier character. * 3. Character sequences that partially match the above * perl regular expression are invalid or incomplete * format specifiers, but this code will still treat the * sequence the same way valid and complete format * specifiers are treated. * 4. As shown in the above perl reular expression, this code * assumes NO MAXIMUM length of the format specifier or the * several subsections thereof. * * Please let the author know if any of these assumptions are * invalid or dangerous. * * This code uses a very simple finite state parsing machine to * count the number of format items and the types. * * A few quick definitions for the finite state parsing machine: * * FMT_START Exactly ONE occurences of the '%' character * FMT_FLAG ZERO or MORE of the following characters: * '#', '+', ' ', '0', or '-' * FMT_WIDTHSTART ZERO or ONE of the following characters: * digits '1' through '9', or the '*' character * FMT_WIDTH ZERO or MORE of the following characters: * digits '0' through '9', or the '*' character * FMT_DOT ZERO or ONE occurences of the '.' character. * a precision specification or end of FMT_WIDTH * FMT_PREC ZERO or MORE of the following characters: * digits '0' through '9', or the '*' character * FMT_FORMAT Exactly ONE of the following characters: * 'd', 'i', 'o', 'u', 'X', 'x', 'f', 'e', 'E', 'g', * 'G', 'c', 's', or '%' * * Finite state parsing machine states: * * STATE: MEANING: * MODE_CHAR NO formatting has yet been encountered - expecting * FMT_START or more ordinary characters or escape * sequences. * MODE_START FMT_START encountered - expecting FMT_FLAG | FMT_WIDTHSTART | * FMT_DOT | FMT_FORMAT * MODE_FLAG FMT_FLAG encountered - expecting FMT_WIDTHSTART | FMT_DOT | * FMT_FORMAT * MODE_WIDTH FMT_WIDTH encountered - expecting FMT_WIDTH | FMT_DOT | * FMT_FORMAT * MODE_DOT FMT_DOT encountered - expecting FMT_PREC | FMT_FORMAT * MODE_PREC FMT_PREC encountered - expecting FMT_PREC | FMT_FORMAT * MODE_FORMAT FMT_FORMAT encountered - ALL DONE - this mode really * doesn't exist since the formatting is finished at this * point and the mode will revert back to MODE_CHAR. */ /* State machine modes */ #define MODE_CHAR 0 #define MODE_START 1 #define MODE_FLAG 2 #define MODE_WIDTH 3 #define MODE_DOT 4 #define MODE_PREC 5 /* Some defines for inlining parsing character comparisons */ #define FMT_START_TEST \ *ch == '%' #define FMT_WIDTHSTART_TEST \ '1': \ case '2': \ case '3': \ case '4': \ case '5': \ case '6': \ case '7': \ case '8': \ case '9': \ case '*' #define FMT_WIDTH_TEST \ '0': \ case '1': \ case '2': \ case '3': \ case '4': \ case '5': \ case '6': \ case '7': \ case '8': \ case '9': \ case '*' #define FMT_PREC_TEST FMT_WIDTH_TEST #define FMT_FLAG_TEST \ '#': \ case '+': \ case '-': \ case ' ': \ case '0' #define FMT_DOT_TEST '.' #define FMT_FORMAT_TEST \ 'd': \ case 'i': \ case 'o': \ case 'u': \ case 'X': \ case 'x': \ case 'f': \ case 'e': \ case 'E': \ case 'g': \ case 'G': \ case 'c': \ case 's': \ case '%' /* And some inlined action code */ #define DO_MODE_START {\ mode = MODE_START; \ } #define DO_MODE_FLAG {\ if (buf != (char *)0) \ *buf++ = *ch; \ formatlen++; \ mode = MODE_FLAG; \ } #define DO_MODE_WIDTH {\ if (buf != (char *)0) \ *buf++ = *ch; \ formatlen++; \ mode = MODE_WIDTH; \ } #define DO_MODE_DOT {\ if (buf != (char *)0) \ *buf++ = *ch; \ formatlen++; \ mode = MODE_DOT; \ } #define DO_MODE_PREC {\ if (buf != (char *)0) \ *buf++ = *ch; \ formatlen++; \ mode = MODE_PREC; \ } #define DO_MODE_FORMAT {\ if (buf != (char *)0) \ *buf++ = *ch; \ formatlen++; \ mode = MODE_CHAR; \ } #define DO_INVALID_FORMAT {\ if (buf != (char *)0) \ *buf++ = '*'; \ formatlen++; \ mode = MODE_CHAR; \ } /* * INTERNAL USE ONLY - This function implements the finite state * format scanning engine. The arguments are the format string * and an optional buffer in which to write the compressed * format representation. It is INTERNAL ONLY because it MUST * be called once with a null buffer so as to count how many * format items there are. Then the second time, a buffer of * sufficient length can be passed to the routine. If a person * were foolish enough to just pass in an arbitrary buffer, there * is the possibility of buffer overflow. So DON'T DO IT! * * Returns the size of the buffer required (including space for the * terminating byte and in some cases, some extra "work" space). */ /* * Name: scan_fmtstr * Parameters; * fmt A C-style string containing a printf()-like * format string. * buf Another C-style string buffer OR NULL * * This function is intended for INTERNAL USE ONLY by the other * functions in this file. It implements the finite state * machine. * * Return values: * * The function returns the size of a C-style string buffer * required to hold a compressed format string (including a * byte for the C-string terminator character '\0'). * * If called with a NULL buf parameter, the parser only returns * the expected buffer size. If called with a NON-NULL buf * parameter, that non-NULL buffer will contain the compressed * string of exactly the size (including terminating '\0' char.) * that the return value indicates. * * WARNING: * If the passed buffer is too small, this function will * overflow/overrun the buffer. That is why it is recommended * that this function FIRST be called with a NULL buf parameter * to determine the size of buffer required, then the function * be called a second time (using the exact same format string) * with a buffer of sufficient size. */ size_t scan_fmtstr(char *fmt, char *buf) { char *ch; size_t formatlen = 0; size_t maxlen = 0; int mode = MODE_CHAR; /* Shortcut if the format string is null */ if (fmt == (char *)0) { if (buf != (char *)0) *buf = (char)0; return 0; } /* * Finite state machine. If the passed buffer variable buf * is null (buf == (char *)0) then we just count the format * specifications. If it is non-null, we assume that a * count has already been done on this format string in the * past and a sufficient C string has been allocated for the * buffer so we can safely copy format items into it. */ for (ch = fmt; *ch; ch++) { switch (mode) { case MODE_CHAR: if (FMT_START_TEST) DO_MODE_START; break; case MODE_START: switch (*ch) { case FMT_FLAG_TEST: DO_MODE_FLAG; break; case FMT_WIDTHSTART_TEST: DO_MODE_WIDTH; break; case FMT_DOT_TEST: DO_MODE_DOT; break; case FMT_FORMAT_TEST: DO_MODE_FORMAT; break; /* Invalid Format */ default: DO_INVALID_FORMAT; } break; case MODE_FLAG: switch (*ch) { case FMT_WIDTHSTART_TEST: DO_MODE_WIDTH; break; case FMT_DOT_TEST: DO_MODE_DOT; break; case FMT_FORMAT_TEST: DO_MODE_FORMAT; break; /* Invalid Format */ default: DO_INVALID_FORMAT; } break; case MODE_WIDTH: switch (*ch) { case FMT_WIDTH_TEST: DO_MODE_WIDTH; break; case FMT_DOT_TEST: DO_MODE_DOT; break; case FMT_FORMAT_TEST: DO_MODE_FORMAT; break; /* Invalid Format */ default: DO_INVALID_FORMAT; } break; case MODE_DOT: switch (*ch) { case FMT_PREC_TEST: DO_MODE_PREC; break; case FMT_FORMAT_TEST: DO_MODE_FORMAT; break; /* Invalid Format */ default: DO_INVALID_FORMAT; } break; case MODE_PREC: switch (*ch) { case FMT_PREC_TEST: DO_MODE_PREC; break; case FMT_FORMAT_TEST: DO_MODE_FORMAT; break; /* Invalid Format */ default: DO_INVALID_FORMAT; } break; default: /* This should NEVER happen! */ #ifdef DEBUG assert(0); #endif } } if (mode != MODE_CHAR) DO_INVALID_FORMAT; if (buf != (char *)0) /* Terminate the buffer with the C '\0' terminator character */ *buf = (char)0; /* * Return the size of buffer required (with space for the * terminating byte) */ return ((maxlen > formatlen) ? maxlen : formatlen) + 1; } char *compress_fmtstr(char *fmt) { size_t len = scan_fmtstr(fmt, (char *)0); char *buf = (char *)malloc(len * sizeof(char)); if (buf == (char *)0) { /* malloc() must have failed - return NULL */ return buf; } scan_fmtstr(fmt, buf); return buf; } char *safe_fmtstr(char *default_fmt, char *pref_fmt, size_t fmt_len) { char *a, *b; /* If the preferred format length exceeds the limit, return the default */ if (fmt_len > 0 && pref_fmt != (char *)0 && strlen(pref_fmt) > fmt_len) return default_fmt; a = compress_fmtstr(default_fmt); if (a == (char *)0) { /* malloc() must have failed - default to original format */ return default_fmt; } b = compress_fmtstr(pref_fmt); if (b == (char *)0) { /* malloc() must have failed - default to original format */ free(a); return default_fmt; } if (strcmp(a, b)) { /* * The pref_fmt string does NOT match the default_fmt * string's format specifiers, and so is considered * UNSAFE. Return the default. */ free(a); free(b); return default_fmt; } /* * The pref_fmt string does MATCH the default_fmt * string's format specifiers exactly, so it is considered * SAFE. Return the preferred format. */ free(a); free(b); return pref_fmt; } #if 0 /* FOR TESTING ONLY: */ #include int main(int argc, char *argv[]) { char **a = argv; if (*a) a++; while (*a && **a) { /* * I'm deliberately not free()ing the malloc()ated strings that * the compress_fmtstr() function returns because this test code * can leak memory all it wants to. ;) */ printf("FORMAT STRING '%s' compresses to '%s'\n",*a,compress_fmtstr(*a)); if (*(a+1)) printf("safe_fmtstr('%s', '%s', 0) = '%s'\n", *a, *(a+1), safe_fmtstr(*a, *(a+1), 0)); a++; } return 0; } #endif To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-security" in the body of the message