Date: Tue, 16 Jun 2009 18:32:43 -0500 From: Jeffrey Goldberg <jeffrey@goldmark.org> To: Gary Kline <kline@thought.org> Cc: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: feedback, comments on this php-delimiter scrubbing program? Message-ID: <57E07CDA-AA9E-4A8F-91BC-3BF90177CA3A@goldmark.org> In-Reply-To: <20090616170244.GA40934@thought.org> References: <20090616012114.GA38011@thought.org> <200906151857.45945.mel.flynn%2Bfbsd.questions@mailing.thruhere.net> <20090616153040.GA40540@thought.org> <20090616170244.GA40934@thought.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail-1486-599991389 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit On Jun 16, 2009, at 12:02 PM, Gary Kline wrote: > this works, but still gives a warning. it's sloppy coding, but > as a second version... You've got some superfluous tests for EOF in some places, and you may also be missing some. Your approach has been to "look ahead" with an extra getc() when you come across an interesting character. I recommended that instead of doing that you keep a variable "state" to keep track of where you are (and have very recently been) instead of looking ahead. I haven't tried your code, but I suspect that it behaves incorrectly with input (1) that has a '<' as a final character (2) that includes things like "<<<<?" (3) that includes things like "??>" There is a systematic (if a bit tedious) way to make sure that you check every condition. When you've worked enough on this, you can peek at an answer which I've attached. (For the rest of you, I know that it would be more efficient to make the big switch on state instead of on input character, but for pedagogical reasons I did it the other way around. I deliberately avoided other available tunings). The extensive comments in the code should make it clear what is going on. Once you understand the concepts here it should be very easy to write code to do similar things in the future. -j -- Jeffrey Goldberg http://www.goldmark.org/jeff/ --Apple-Mail-1486-599991389 Content-Disposition: attachment; filename=gkline.c Content-Type: application/octet-stream; x-unix-mode=0644; name="gkline.c" Content-Transfer-Encoding: 7bit /* simple code to parse out stuff between <? and ?> inclusively * does not need to nest. exits with 0 if end of input occurs outside of * the PHP stuff, and with 1 otherwise * * Copyright 2009 Jeffrey Goldberg <jeffrey@goldmark.org> * free to use and distribute under the BSD license */ /* Speicial cases need to be considered: * (1) input ends with "<" as the last character * (2) Input contains "<<<<?" * (3) Input contains "???>" * and many more * * Overall strategy is to recognize that we are interested in five * types of characters and four different "states" * The character types are * 1. "<" * 2. "?" * 3. ">" * 4. "EOF" (yes, it's not a character, but let me call it that) * 5. any other character * * The states are for keeping track of what we have found already, * there are 4 of them * * (1) Outside PHP with nothing special going on * (2) We were outside PHP but found a '<' * (3) We are inside the PHP * (4) We are inside the PHP and have found a '?' * * It's easy to forget, but important to remember, that in each state * we might encounter any character. This gives us 5 X 4 different * combinations, which each require their own behavior. The way to * make sure that you take care of all 20 possibilities is to * explicitly spell them out. */ #include <stdlib.h> #include <stdio.h> /* define the various values for state * We could use "enum state {...}", but I'm old fashioned. */ #define OUTSIDE (1) // in normal text #define AFTER_LT (2) // found "<" looking for "? #define INSIDE (3) // between <? and ?> #define AFTER_Q (4) // was INSIDE and after "?" int main(void) { int c; // because of EOF we need to make this an integer unsigned char state; state = OUTSIDE; /* set starting state */ while( (c=getchar()) ) switch (c) { case '<': if (state == OUTSIDE) state = AFTER_LT; else if (state == AFTER_LT) putchar('<'); // print previous '<' else if (state == INSIDE) ; /* stay in state, don't print */ else if (state == AFTER_Q) state = INSIDE; break; case '?': if (state == AFTER_LT) state = INSIDE; else if (state == INSIDE) state = AFTER_Q; else if (state == AFTER_Q) ; /* stay in same state, don't print */ else if (state == OUTSIDE) putchar(c); /* and stay in same state */ break; case '>': if (state == AFTER_Q) state = OUTSIDE; else if (state == AFTER_LT) { putchar('<'); putchar(c); state = OUTSIDE; } else if (state == INSIDE) ; /* same state, don't print */ else if (state == OUTSIDE) putchar(c); break; case EOF: if (state == OUTSIDE) exit(0); else if (state == AFTER_LT) { putchar('<'); exit(0); } else exit(1); default: /* for normal characters */ if (state == OUTSIDE) putchar(c); else if (state == INSIDE) ; /* same state, don't print */ else if (state == AFTER_LT) { putchar('<'); putchar(c); state = OUTSIDE; } else if (state == AFTER_Q) state = INSIDE; } exit(3); /* this should never be reached, but it eliminates a gcc warning */ } --Apple-Mail-1486-599991389 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit --Apple-Mail-1486-599991389--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57E07CDA-AA9E-4A8F-91BC-3BF90177CA3A>