Date: Sun, 08 May 2011 18:49:38 -0700 From: Bakul Shah <bakul@bitblocks.com> To: Gabor Kovesdan <gabor@kovesdan.org> Cc: "Pedro F. Giffuni" <giffunip@yahoo.com>, hackers@FreeBSD.org, Brooks Davis <brooks@freebsd.org> Subject: Re: [RFC] Replacing our regex implementation Message-ID: <20110509014938.EE292B827@mail.bitblocks.com> In-Reply-To: Your message of "Mon, 09 May 2011 02:37:10 BST." <4DC74546.1060902@kovesdan.org> References: <4DC7356C.20905@kovesdan.org> <20110509011709.5455CB827@mail.bitblocks.com> <4DC74546.1060902@kovesdan.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 09 May 2011 02:37:10 BST Gabor Kovesdan <gabor@kovesdan.org> wrote: > Em 09-05-2011 02:17, Bakul Shah escreveu: > > As per the following URLs re2 is much faster than TRE (on the > > benchmarks they ran): > > > > http://lh3lh3.users.sourceforge.net/reb.shtml > > http://sljit.sourceforge.net/regex_perf.html > > > > re2 is in C++& has a PCRE API, while TRE is in C& has a > > POSIX API. Both have BSD copyright. Is it worth considering > > making re2 posix compliant? > Is it wchar-clean and is it actively maintained? C++ is quite > anticipated for the base system and I'm not very skilled in it so atm I > couldn't promise to use re2 instead of TRE. And anyway, can C++ go into > libc? According to POSIX, the regex code has to be there. But let's see > what others say... If we happen to use re2 later, my extensions that I > talked about in points 2, and 3, would still be useful. > > Anyway, according to some earlier vague measures, TRE seems to be slower > in small matching tasks but scales well. These tests seem to compare > only short runs with the same regex. It should be seem how they compare > e.g. if you grep the whole ports tree with the same pattern. If the > matching scales well once the pattern is compiled, that's more important > than the overall result for such short tasks, imho. re2 is certainly maintained. Don't know about whcar cleanliness. See http://code.google.com/p/re2/ Also check out Russ Cox's excellent articles on implementing it http://swtch.com/~rsc/regexp/ and this: http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html C++ may be an impediment for it to go into libc but one can certainly put a C interface on a C++ library.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110509014938.EE292B827>