Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 8 May 2011 21:35:04 -0500
From:      Zhihao Yuan <lichray@gmail.com>
To:        Bakul Shah <bakul@bitblocks.com>
Cc:        Gabor Kovesdan <gabor@kovesdan.org>, "Pedro F. Giffuni" <giffunip@yahoo.com>, hackers@freebsd.org, Brooks Davis <brooks@freebsd.org>
Subject:   Re: [RFC] Replacing our regex implementation
Message-ID:  <BANLkTim-T4m=jUfXT_wFAv3n=H6QG2N1iQ@mail.gmail.com>
In-Reply-To: <20110509014938.EE292B827@mail.bitblocks.com>
References:  <4DC7356C.20905@kovesdan.org> <20110509011709.5455CB827@mail.bitblocks.com> <4DC74546.1060902@kovesdan.org> <20110509014938.EE292B827@mail.bitblocks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, May 8, 2011 at 8:49 PM, Bakul Shah <bakul@bitblocks.com> wrote:
> On Mon, 09 May 2011 02:37:10 BST Gabor Kovesdan <gabor@kovesdan.org> =C2=
=A0wrote:
>> Em 09-05-2011 02:17, Bakul Shah escreveu:
>> > As per the following URLs re2 is much faster than TRE (on the
>> > benchmarks they ran):
>> >
>> > http://lh3lh3.users.sourceforge.net/reb.shtml
>> > http://sljit.sourceforge.net/regex_perf.html
>> >
>> > re2 is in C++& =C2=A0has a PCRE API, while TRE is in C& =C2=A0has a
>> > POSIX API. =C2=A0Both have BSD copyright. Is it worth considering
>> > making re2 posix compliant?
>> Is it wchar-clean and is it actively maintained? C++ is quite
>> anticipated for the base system and I'm not very skilled in it so atm I
>> couldn't promise to use re2 instead of TRE. And anyway, can C++ go into
>> libc? According to POSIX, the regex code has to be there. But let's see
>> what others say... If we happen to use re2 later, my extensions that I
>> talked about in points 2, and 3, would still be useful.
>>
>> Anyway, according to some earlier vague measures, TRE seems to be slower
>> in small matching tasks but scales well. These tests seem to compare
>> only short runs with the same regex. It should be seem how they compare
>> e.g. if you grep the whole ports tree with the same pattern. If the
>> matching scales well once the pattern is compiled, that's more important
>> than the overall result for such short tasks, imho.
>
> re2 is certainly maintained. Don't know about whcar cleanliness.
> See
> =C2=A0 =C2=A0http://code.google.com/p/re2/
> Also check out Russ Cox's excellent articles on implementing it
> =C2=A0 =C2=A0http://swtch.com/~rsc/regexp/
> and this:
> =C2=A0 =C2=A0http://google-opensource.blogspot.com/2010/03/re2-principled=
-approach-to-regular.html
>
> C++ may be an impediment for it to go into libc but one can
> certainly put a C interface on a C++ library.

1. This lib accepts many popular grammars (PCRE, POSIX, vim, etc.),
but it does not allow you to change the mode.
http://code.google.com/p/re2/source/browse/re2/re2.h

2. It focuses on speed and features, not stability and standardization.

3. It uses C++. We seldom accepts C++ code in base system, and does
not accept it in libc.

So, as far as I concerned, re2 is good as a re engine in some
applications, but may not fit the requirements for a regex in libc.

> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org=
"
>



--=20
Zhihao Yuan
The best way to predict the future is to invent it.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTim-T4m=jUfXT_wFAv3n=H6QG2N1iQ>