From owner-freebsd-hackers@FreeBSD.ORG Mon May 9 05:21:45 2011 Return-Path: Delivered-To: hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F1311065670; Mon, 9 May 2011 05:21:45 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (ns1.bitblocks.com [173.228.5.8]) by mx1.freebsd.org (Postfix) with ESMTP id 10B5C8FC0C; Mon, 9 May 2011 05:21:44 +0000 (UTC) Received: from bitblocks.com (localhost [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id 65DD7B827; Sun, 8 May 2011 22:21:44 -0700 (PDT) To: lev@FreeBSD.org In-reply-to: Your message of "Mon, 09 May 2011 08:30:57 +0400." <1164564191.20110509083057@serebryakov.spb.ru> References: <4DC7356C.20905@kovesdan.org> <20110509011709.5455CB827@mail.bitblocks.com> <1164564191.20110509083057@serebryakov.spb.ru> Comments: In-reply-to Lev Serebryakov message dated "Mon, 09 May 2011 08:30:57 +0400." Date: Sun, 08 May 2011 22:21:44 -0700 From: Bakul Shah Message-Id: <20110509052144.65DD7B827@mail.bitblocks.com> Cc: Gabor Kovesdan , "Pedro F. Giffuni" , hackers@FreeBSD.org, Brooks Davis Subject: Re: [RFC] Replacing our regex implementation X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 05:21:45 -0000 On Mon, 09 May 2011 08:30:57 +0400 Lev Serebryakov wrote: > Hello, Bakul. > You wrote 9 =EC=E0=FF 2011 =E3., 5:17:09: > > > As per the following URLs re2 is much faster than TRE (on the > > benchmarks they ran): > > > http://lh3lh3.users.sourceforge.net/reb.shtml > > http://sljit.sourceforge.net/regex_perf.html > re2 is much faster at price of memory. I don't remember details now, > but I've found (simple) situations when re2 consumes a HUGE amount of > memory (read: hundreds of megabytes). It work faster than tre, yes. If > you have this memory to RE engine alone. As per http://swtch.com/~rsc/regexp/regexp3.html RE2 requires about 10 KB per regexp, in contrast to PCRE's half a KB. This is not excessive in this day and age. But 100s of megabytes sounds very strange.... I'd appreciate a reference to an actual example (and I am sure so would the author of re2). But I do not want to defend re2 here. My intent was to just make sure re2 was at least considered. Mainly because it was actually quite surprising to see TRE is 10 to 45 times slower than re2!