From owner-freebsd-hackers@FreeBSD.ORG Mon May 9 03:03:17 2011 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 935F1106566B; Mon, 9 May 2011 03:03:17 +0000 (UTC) (envelope-from lichray@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id E33038FC15; Mon, 9 May 2011 03:03:16 +0000 (UTC) Received: by fxm11 with SMTP id 11so4710676fxm.13 for ; Sun, 08 May 2011 20:03:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=jaUvqWT+LTs9ZdYRRSACSn0zTicMglDmu1E1kqC5psE=; b=iKpJ4Wg5xAYLrRZKFiRfCQN9l8vTVpoYyv2F5n2/MLKZUN7PaR1srUPRjBGWVBLlTX mwW+AVMDjuoKMnm1Z68n/nWewL2azFdpQ8pNgP2Gle4amHC6n/u6nN2o05NzkjcSJ78Z GeHPnbcBi0BIH5Xc394LD8h70l1TRTnIAxWck= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=M4X2Lbw7CU2o7e9n6R7tXcZR1bAKF201Ubs1NdlaoI/jF/Ft3sDdPn+xCTZh3hrbJj xnWTFfxkAYebaxsvD/PiBVrAh7MNbs7MOXVdqhidtM65ck4QCr3RH5SmEiUg4vbn6syG mjqmljXse15b2bR9SOLNFDG4f3AFeavft+/tw= MIME-Version: 1.0 Received: by 10.223.27.18 with SMTP id g18mr2662121fac.52.1304908504246; Sun, 08 May 2011 19:35:04 -0700 (PDT) Received: by 10.223.96.142 with HTTP; Sun, 8 May 2011 19:35:04 -0700 (PDT) In-Reply-To: <20110509014938.EE292B827@mail.bitblocks.com> References: <4DC7356C.20905@kovesdan.org> <20110509011709.5455CB827@mail.bitblocks.com> <4DC74546.1060902@kovesdan.org> <20110509014938.EE292B827@mail.bitblocks.com> Date: Sun, 8 May 2011 21:35:04 -0500 Message-ID: From: Zhihao Yuan To: Bakul Shah Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Gabor Kovesdan , "Pedro F. Giffuni" , hackers@freebsd.org, Brooks Davis Subject: Re: [RFC] Replacing our regex implementation X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 03:03:17 -0000 On Sun, May 8, 2011 at 8:49 PM, Bakul Shah wrote: > On Mon, 09 May 2011 02:37:10 BST Gabor Kovesdan =C2= =A0wrote: >> Em 09-05-2011 02:17, Bakul Shah escreveu: >> > As per the following URLs re2 is much faster than TRE (on the >> > benchmarks they ran): >> > >> > http://lh3lh3.users.sourceforge.net/reb.shtml >> > http://sljit.sourceforge.net/regex_perf.html >> > >> > re2 is in C++& =C2=A0has a PCRE API, while TRE is in C& =C2=A0has a >> > POSIX API. =C2=A0Both have BSD copyright. Is it worth considering >> > making re2 posix compliant? >> Is it wchar-clean and is it actively maintained? C++ is quite >> anticipated for the base system and I'm not very skilled in it so atm I >> couldn't promise to use re2 instead of TRE. And anyway, can C++ go into >> libc? According to POSIX, the regex code has to be there. But let's see >> what others say... If we happen to use re2 later, my extensions that I >> talked about in points 2, and 3, would still be useful. >> >> Anyway, according to some earlier vague measures, TRE seems to be slower >> in small matching tasks but scales well. These tests seem to compare >> only short runs with the same regex. It should be seem how they compare >> e.g. if you grep the whole ports tree with the same pattern. If the >> matching scales well once the pattern is compiled, that's more important >> than the overall result for such short tasks, imho. > > re2 is certainly maintained. Don't know about whcar cleanliness. > See > =C2=A0 =C2=A0http://code.google.com/p/re2/ > Also check out Russ Cox's excellent articles on implementing it > =C2=A0 =C2=A0http://swtch.com/~rsc/regexp/ > and this: > =C2=A0 =C2=A0http://google-opensource.blogspot.com/2010/03/re2-principled= -approach-to-regular.html > > C++ may be an impediment for it to go into libc but one can > certainly put a C interface on a C++ library. 1. This lib accepts many popular grammars (PCRE, POSIX, vim, etc.), but it does not allow you to change the mode. http://code.google.com/p/re2/source/browse/re2/re2.h 2. It focuses on speed and features, not stability and standardization. 3. It uses C++. We seldom accepts C++ code in base system, and does not accept it in libc. So, as far as I concerned, re2 is good as a re engine in some applications, but may not fit the requirements for a regex in libc. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " > --=20 Zhihao Yuan The best way to predict the future is to invent it.