Date: Mon, 23 Aug 2010 14:42:56 +0100 From: chrisk-freebsd@list.mightyreason.com To: Bob Bishop <rb@gid.co.uk> Cc: freebsd-current@freebsd.org, gabor@freebsd.org Subject: Re: grep and Regular expression correctness Message-ID: <4C727AE0.8090709@list.mightyreason.com> In-Reply-To: <1179D92F-9AB7-43A4-8943-7DA58F5E234D@gid.co.uk> References: <4C7260CC.9070407@list.mightyreason.com> <1179D92F-9AB7-43A4-8943-7DA58F5E234D@gid.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On 23/08/2010 13:45, Bob Bishop wrote: > On 23 Aug 2010, at 12:51, chrisk-freebsd@list.mightyreason.com wrote: >> [...]The hardest part of POSIX regular expressions is in picking out the >> correct captured subexpressions. This makes programs like "sed" >> vulnerable to the bad regex.h engine that comes with the operating system. >> >> Luckily grep does not need the captured subexpressions, and thus does >> not need the complexity that comes from the ideas behind TRE. [etc] > I think grep does potentially need the captured subexpressions, for eg: > > \([abc]\)99\1 > > matching eg b99b That would be "basic POSIX" instead of "extended POSIX" regular expressions. Making basic regular expressions correct is something I have never attempted. Correctness is what I would like to emphasize. GNU grep defines the regex.h header for POSIX regular expressions but does not try to deliver the correct POSIX answer. Getting the wrong answer quickly is not a virtue as debugging prematurely optimized code is too hard. http://www2.research.att.com/~gsf/testregex/re-interpretation.html has the best explanation of basic and extended POSIX regular expressions. And last I checked the AT&T code was nearly correct but exponentially slow. I hope that if "grep" in *BSD and thus OSX gets worked on that it gets more correct, not less. I hope that someday the regex.h engine in *BSD and thus OSX gets fixed, but I am not holding my breath for that. All I have written is a correct (and efficient) "extended POSIX" matching engine in Haskell and OCaml. Cheers, Chris Kuklewicz
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C727AE0.8090709>