Date: Tue, 24 Aug 2010 03:16:09 +0200 From: "C. P. Ghost" <cpghost@cordula.ws> To: Gabor Kovesdan <gabor@freebsd.org> Cc: freebsd-current@freebsd.org Subject: Re: What to learn from the BSD grep case [Was: why GNU grep is fast] Message-ID: <AANLkTi=ksoBptimSgnYUxp8%2BwYwOjidZ03uJyBFTTwz7@mail.gmail.com> In-Reply-To: <4C728DE5.4060809@FreeBSD.org> References: <201008210231.o7L2VRvI031700@ducky.net> <4C728DE5.4060809@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Aug 23, 2010 at 5:04 PM, Gabor Kovesdan <gabor@freebsd.org> wrote: > 4, We really need a good regex library. From the comments, it seems there's > no such in the open source world. GNU libregex isn't efficient because GNU > grep uses those workarounds that Mike kindly pointed out. Oniguruma was > extremely slow when I checked it. PCRE supports Perl-style syntax with a > POSIX-like API but not POSIX regex. Google RE2 is the same with additional > egrep syntax but doesn't have support for standard POSIX regexes. Plan 9 > regex only supports egrep syntax. It seems that TRE is the best choice. It > is BSD-licensed, supports wchar and POSIX(ish) regexes and it is quite fast. I know it's C++ and not exactly what you're needing, but have you evaluated Boost::Regex? Isn't there some code that can be retrofitted into a C lib? http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/index.html > I don't know the theoretical background of regex engines but I'm wondering > if it's possible top provide an alternative API with byte-counted buffers > and use the heuristical speedup with fixed string matching. As Mike pointed > out the POSIX API is quite limiting because it works on NUL-terminated > strings and not on byte-counted buffers, so we couldn't just do it with a > POSIX-conformant library but it would be nice if we could implement it in > such a library with an alternative interface. > > Gabor -cpghost. -- Cordula's Web. http://www.cordula.ws/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=ksoBptimSgnYUxp8%2BwYwOjidZ03uJyBFTTwz7>