From owner-soc-status@FreeBSD.ORG  Sun Jun 19 14:56:59 2011
Return-Path: <owner-soc-status@FreeBSD.ORG>
Delivered-To: soc-status@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 57052106568B
	for <soc-status@freebsd.org>; Sun, 19 Jun 2011 14:56:59 +0000 (UTC)
	(envelope-from gabor@kovesdan.org)
Received: from server.mypc.hu (server.mypc.hu [87.229.73.95])
	by mx1.freebsd.org (Postfix) with ESMTP id 147E78FC18
	for <soc-status@freebsd.org>; Sun, 19 Jun 2011 14:56:58 +0000 (UTC)
Received: from server.mypc.hu (localhost [127.0.0.1])
	by server.mypc.hu (Postfix) with ESMTP id 63CF514E585B
	for <soc-status@freebsd.org>; Sun, 19 Jun 2011 16:56:57 +0200 (CEST)
X-Virus-Scanned: amavisd-new at server.mypc.hu
Received: from server.mypc.hu ([127.0.0.1])
	by server.mypc.hu (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id kgiGQ7aqsMo7 for <soc-status@freebsd.org>;
	Sun, 19 Jun 2011 16:56:55 +0200 (CEST)
Received: from [193.137.158.156] (unknown [193.137.158.156])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by server.mypc.hu (Postfix) with ESMTPSA id DC9E214E585A
	for <soc-status@freebsd.org>; Sun, 19 Jun 2011 16:56:54 +0200 (CEST)
Message-ID: <4DFE0E3C.8000804@kovesdan.org>
Date: Sun, 19 Jun 2011 15:57:00 +0100
From: Gabor Kovesdan <gabor@kovesdan.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; pt-PT;
	rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10
MIME-Version: 1.0
To: soc-status@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: regex status report #4
X-BeenThere: soc-status@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Summer of Code Status Reports and Discussion <soc-status.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/soc-status>,
	<mailto:soc-status-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/soc-status>
List-Post: <mailto:soc-status@freebsd.org>
List-Help: <mailto:soc-status-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/soc-status>,
	<mailto:soc-status-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Jun 2011 14:56:59 -0000

Hi,

this week I tested more the code and opposed to my earlier impressions, 
I noticed that the performance is actually varying. With sed it usually 
performs like the old code or in some cases it was even significantly 
better. It seems that grep is just an extreme case that is very sensible 
to performance. So I decided to clean the stuff that I have so far and 
publish a patch for testing. If people find the out of the box 
performance good enough, we can proceed with the first phase of 
replacing the regex code. It has to be tested and checked thoroughly, 
though, that's why I want to provide a patch as soon as possible. And 
grep will still use the GNU regex code so it's performance will not be 
affected. The patch will be ready soon.

Apart from this, I've been looking at how to optimize the performance. 
There are a couple of ideas that could possibly work: simple matcher for 
fix and simple expressions; optimizing the internals of the code, 
wrapping with a heuristical matcher that isolates the possibly matching 
part and only applies the heavier algorithm on the narrower context, 
etc. I have to think which techniques should be used with TRE and then 
implement them. I haven't written any optimization code yet because 
first I want to see clearly how it should be done.

Gabor