From owner-freebsd-hackers@FreeBSD.ORG Tue May 10 00:15:19 2011 Return-Path: Delivered-To: hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5886B106564A; Tue, 10 May 2011 00:15:19 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (ns1.bitblocks.com [173.228.5.8]) by mx1.freebsd.org (Postfix) with ESMTP id 30DD38FC17; Tue, 10 May 2011 00:15:18 +0000 (UTC) Received: from bitblocks.com (localhost [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id 2C855B827; Mon, 9 May 2011 17:15:18 -0700 (PDT) To: David Schultz In-reply-to: Your message of "Mon, 09 May 2011 17:51:46 EDT." <20110509215146.GA18135@zim.MIT.EDU> References: <4DC7356C.20905@kovesdan.org> <20110509011709.5455CB827@mail.bitblocks.com> <4DC74546.1060902@kovesdan.org> <20110509014938.EE292B827@mail.bitblocks.com> <20110509061334.A62EAB827@mail.bitblocks.com> <20110509215146.GA18135@zim.MIT.EDU> Comments: In-reply-to David Schultz message dated "Mon, 09 May 2011 17:51:46 -0400." Date: Mon, 09 May 2011 17:15:18 -0700 From: Bakul Shah Message-Id: <20110510001518.2C855B827@mail.bitblocks.com> Cc: Gabor Kovesdan , "Pedro F. Giffuni" , hackers@FreeBSD.ORG, Brooks Davis , Zhihao Yuan Subject: Re: [RFC] Replacing our regex implementation X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2011 00:15:19 -0000 On Mon, 09 May 2011 17:51:46 EDT David Schultz wrote: > On Sun, May 08, 2011, Bakul Shah wrote: > > On Sun, 08 May 2011 21:35:04 CDT Zhihao Yuan wrote: > > > 1. This lib accepts many popular grammars (PCRE, POSIX, vim, etc.), > > > but it does not allow you to change the mode. > > > http://code.google.com/p/re2/source/browse/re2/re2.h > > > > The mode is decided when an RE2 object is instantiated so this > > is ok. You can certainly instantiate multiple objects with > > different options if so desired. > > > > > 2. It focuses on speed and features, not stability and standardization. > > > > Look at the open issues. Seems stable enough to me. re2 has a > > posix only mode. It also does unicode. s/posix only mode/posix only mode as well/ > > > > > 3. It uses C++. We seldom accepts C++ code in base system, and does > > > not accept it in libc. > > > > This is the show stopper. > > Use of C++ is a clear show-stopper if it introduces new runtime > requirements, e.g., dependencies on STL or exceptions. Aside from > that, however, I can't think of any fundamental, technical reasons > why a component of libc couldn't be written in C++. (Perhaps the > toolchain maintainers could name some, and they'd be the best > authority on the matter.) You can expect some resistance > regardless, however, so make sure the technical merits of RE2 are > worth the trouble. Ok, I just verified there are no additional runtime requirements by running a simple test, where I added a C wrapper around an RE2 C++ call, compiled it with c++, then compiled the client C code with cc, and linked everything with cc. This works (tested on on x86_64, under 8.1). I do think RE2 is very well done (see swtch.com/~rsc/regexp articles) and it is actively maintained, has a battery of pretty exhaustive tests. Seems TRE's author also likes re2: http://hackerboss.com/is-your-regex-matcher-up-to-snuff/ So if we want to consider this, it is a real possibility. > IIRC, some of the prior discussions on using more C++ in the base > system got derailed by tangents on multiple inheritance, operator > overloading, misfeatures of STL, and what subset of C++ ought to > be considered kosher in FreeBSD. You don't have to get involved > in any of that because you'd only be proposing to import a > self-contained third-party library. Indeed; we would just use it via a C wrapper API. But I can see someone thinking this is the camel's nose in the tent :-)