Date: Sun, 8 May 2011 21:35:04 -0500 From: Zhihao Yuan <lichray@gmail.com> To: Bakul Shah <bakul@bitblocks.com> Cc: Gabor Kovesdan <gabor@kovesdan.org>, "Pedro F. Giffuni" <giffunip@yahoo.com>, hackers@freebsd.org, Brooks Davis <brooks@freebsd.org> Subject: Re: [RFC] Replacing our regex implementation Message-ID: <BANLkTim-T4m=jUfXT_wFAv3n=H6QG2N1iQ@mail.gmail.com> In-Reply-To: <20110509014938.EE292B827@mail.bitblocks.com> References: <4DC7356C.20905@kovesdan.org> <20110509011709.5455CB827@mail.bitblocks.com> <4DC74546.1060902@kovesdan.org> <20110509014938.EE292B827@mail.bitblocks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, May 8, 2011 at 8:49 PM, Bakul Shah <bakul@bitblocks.com> wrote: > On Mon, 09 May 2011 02:37:10 BST Gabor Kovesdan <gabor@kovesdan.org> =C2= =A0wrote: >> Em 09-05-2011 02:17, Bakul Shah escreveu: >> > As per the following URLs re2 is much faster than TRE (on the >> > benchmarks they ran): >> > >> > http://lh3lh3.users.sourceforge.net/reb.shtml >> > http://sljit.sourceforge.net/regex_perf.html >> > >> > re2 is in C++& =C2=A0has a PCRE API, while TRE is in C& =C2=A0has a >> > POSIX API. =C2=A0Both have BSD copyright. Is it worth considering >> > making re2 posix compliant? >> Is it wchar-clean and is it actively maintained? C++ is quite >> anticipated for the base system and I'm not very skilled in it so atm I >> couldn't promise to use re2 instead of TRE. And anyway, can C++ go into >> libc? According to POSIX, the regex code has to be there. But let's see >> what others say... If we happen to use re2 later, my extensions that I >> talked about in points 2, and 3, would still be useful. >> >> Anyway, according to some earlier vague measures, TRE seems to be slower >> in small matching tasks but scales well. These tests seem to compare >> only short runs with the same regex. It should be seem how they compare >> e.g. if you grep the whole ports tree with the same pattern. If the >> matching scales well once the pattern is compiled, that's more important >> than the overall result for such short tasks, imho. > > re2 is certainly maintained. Don't know about whcar cleanliness. > See > =C2=A0 =C2=A0http://code.google.com/p/re2/ > Also check out Russ Cox's excellent articles on implementing it > =C2=A0 =C2=A0http://swtch.com/~rsc/regexp/ > and this: > =C2=A0 =C2=A0http://google-opensource.blogspot.com/2010/03/re2-principled= -approach-to-regular.html > > C++ may be an impediment for it to go into libc but one can > certainly put a C interface on a C++ library. 1. This lib accepts many popular grammars (PCRE, POSIX, vim, etc.), but it does not allow you to change the mode. http://code.google.com/p/re2/source/browse/re2/re2.h 2. It focuses on speed and features, not stability and standardization. 3. It uses C++. We seldom accepts C++ code in base system, and does not accept it in libc. So, as far as I concerned, re2 is good as a re engine in some applications, but may not fit the requirements for a regex in libc. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " > --=20 Zhihao Yuan The best way to predict the future is to invent it.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTim-T4m=jUfXT_wFAv3n=H6QG2N1iQ>