From owner-freebsd-hackers@FreeBSD.ORG Mon May 9 01:36:53 2011 Return-Path: Delivered-To: hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 57A50106564A; Mon, 9 May 2011 01:36:53 +0000 (UTC) (envelope-from gabor@kovesdan.org) Received: from server.mypc.hu (server.mypc.hu [87.229.73.95]) by mx1.freebsd.org (Postfix) with ESMTP id 141E68FC0A; Mon, 9 May 2011 01:36:52 +0000 (UTC) Received: from server.mypc.hu (localhost [127.0.0.1]) by server.mypc.hu (Postfix) with ESMTP id A531714E53FB; Mon, 9 May 2011 03:36:51 +0200 (CEST) X-Virus-Scanned: amavisd-new at server.mypc.hu Received: from server.mypc.hu ([127.0.0.1]) by server.mypc.hu (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id cy773qUKgwlb; Mon, 9 May 2011 03:36:49 +0200 (CEST) Received: from [193.137.158.228] (unknown [193.137.158.228]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by server.mypc.hu (Postfix) with ESMTPSA id AE40014E53EF; Mon, 9 May 2011 03:36:48 +0200 (CEST) Message-ID: <4DC74546.1060902@kovesdan.org> Date: Mon, 09 May 2011 02:37:10 +0100 From: Gabor Kovesdan User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; pt-PT; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: Bakul Shah References: <4DC7356C.20905@kovesdan.org> <20110509011709.5455CB827@mail.bitblocks.com> In-Reply-To: <20110509011709.5455CB827@mail.bitblocks.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Mon, 09 May 2011 01:48:59 +0000 Cc: "Pedro F. Giffuni" , hackers@FreeBSD.org, Brooks Davis Subject: Re: [RFC] Replacing our regex implementation X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 01:36:53 -0000 Em 09-05-2011 02:17, Bakul Shah escreveu: > As per the following URLs re2 is much faster than TRE (on the > benchmarks they ran): > > http://lh3lh3.users.sourceforge.net/reb.shtml > http://sljit.sourceforge.net/regex_perf.html > > re2 is in C++& has a PCRE API, while TRE is in C& has a > POSIX API. Both have BSD copyright. Is it worth considering > making re2 posix compliant? Is it wchar-clean and is it actively maintained? C++ is quite anticipated for the base system and I'm not very skilled in it so atm I couldn't promise to use re2 instead of TRE. And anyway, can C++ go into libc? According to POSIX, the regex code has to be there. But let's see what others say... If we happen to use re2 later, my extensions that I talked about in points 2, and 3, would still be useful. Anyway, according to some earlier vague measures, TRE seems to be slower in small matching tasks but scales well. These tests seem to compare only short runs with the same regex. It should be seem how they compare e.g. if you grep the whole ports tree with the same pattern. If the matching scales well once the pattern is compiled, that's more important than the overall result for such short tasks, imho. Gabor