From owner-freebsd-current@FreeBSD.ORG Sat Jun 25 14:07:53 2011 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4DAD106564A for ; Sat, 25 Jun 2011 14:07:53 +0000 (UTC) (envelope-from gabor@FreeBSD.org) Received: from server.mypc.hu (server.mypc.hu [87.229.73.95]) by mx1.freebsd.org (Postfix) with ESMTP id 768008FC16 for ; Sat, 25 Jun 2011 14:07:53 +0000 (UTC) Received: from server.mypc.hu (localhost [127.0.0.1]) by server.mypc.hu (Postfix) with ESMTP id 146DF14E5920 for ; Sat, 25 Jun 2011 15:51:41 +0200 (CEST) X-Virus-Scanned: amavisd-new at server.mypc.hu Received: from server.mypc.hu ([127.0.0.1]) by server.mypc.hu (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id KYnHT-sJGwS8 for ; Sat, 25 Jun 2011 15:51:38 +0200 (CEST) Received: from [193.137.158.200] (unknown [193.137.158.200]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by server.mypc.hu (Postfix) with ESMTPSA id 6CFAC14E5850 for ; Sat, 25 Jun 2011 15:51:38 +0200 (CEST) Message-ID: <4E05E7EC.9000902@FreeBSD.org> Date: Sat, 25 Jun 2011 14:51:40 +0100 From: Gabor Kovesdan User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; pt-PT; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: current@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: [CFT] patch to replace the regex code X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 14:07:54 -0000 Hi Folks, you may know that in the Summer of Code programme I'm working on replacing the old regex code with TRE, which is a BSD-licensed implementation. It supports wide characters, is POSIX-compliant and has a good performance compared to most of the open source implementations. Actually, I got mixed results. With sed, in the cases that I tested, the performance was more or less the same and in some few cases, TRE finished in half of the time. On the other hand, with grep sometimes it was significantly slower than the current regex code but grep has always been a complicated case and it has its own regex code, it was just used for testing here. I'm still working on some optimizations but apart from grep, the current performance may already be satisfying for normal cases. This is one thing that I would ask the interested testers to focus on: whether habitual scripts you execute finish later or sooner. I've also checked the POSIX-compliance and I found some cases when TRE is more permissive than the current implementation but that should not be a problem. The patch that I provide know probably can have a cleanup in the contrib area but it's just an early patch purely for testing purposes, so please avoid nitpicking for now and only report performance and/or functional problems. There's a code slush now so there's plenty of time to arrange this if it proves ready to go to 10-CURRENT. Thanks for all of you, who take the effort to give it a try. The patch is here: http://kovesdan.org/patches/tre-20110724.diff Regards, Gabor Kovesdan