From owner-soc-status@FreeBSD.ORG Sun Jun 19 14:56:59 2011 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 57052106568B for ; Sun, 19 Jun 2011 14:56:59 +0000 (UTC) (envelope-from gabor@kovesdan.org) Received: from server.mypc.hu (server.mypc.hu [87.229.73.95]) by mx1.freebsd.org (Postfix) with ESMTP id 147E78FC18 for ; Sun, 19 Jun 2011 14:56:58 +0000 (UTC) Received: from server.mypc.hu (localhost [127.0.0.1]) by server.mypc.hu (Postfix) with ESMTP id 63CF514E585B for ; Sun, 19 Jun 2011 16:56:57 +0200 (CEST) X-Virus-Scanned: amavisd-new at server.mypc.hu Received: from server.mypc.hu ([127.0.0.1]) by server.mypc.hu (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id kgiGQ7aqsMo7 for ; Sun, 19 Jun 2011 16:56:55 +0200 (CEST) Received: from [193.137.158.156] (unknown [193.137.158.156]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by server.mypc.hu (Postfix) with ESMTPSA id DC9E214E585A for ; Sun, 19 Jun 2011 16:56:54 +0200 (CEST) Message-ID: <4DFE0E3C.8000804@kovesdan.org> Date: Sun, 19 Jun 2011 15:57:00 +0100 From: Gabor Kovesdan User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; pt-PT; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: soc-status@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: regex status report #4 X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 14:56:59 -0000 Hi, this week I tested more the code and opposed to my earlier impressions, I noticed that the performance is actually varying. With sed it usually performs like the old code or in some cases it was even significantly better. It seems that grep is just an extreme case that is very sensible to performance. So I decided to clean the stuff that I have so far and publish a patch for testing. If people find the out of the box performance good enough, we can proceed with the first phase of replacing the regex code. It has to be tested and checked thoroughly, though, that's why I want to provide a patch as soon as possible. And grep will still use the GNU regex code so it's performance will not be affected. The patch will be ready soon. Apart from this, I've been looking at how to optimize the performance. There are a couple of ideas that could possibly work: simple matcher for fix and simple expressions; optimizing the internals of the code, wrapping with a heuristical matcher that isolates the possibly matching part and only applies the heavier algorithm on the narrower context, etc. I have to think which techniques should be used with TRE and then implement them. I haven't written any optimization code yet because first I want to see clearly how it should be done. Gabor