From owner-soc-status@FreeBSD.ORG Sat Jun 4 19:18:11 2011 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E181C106566B for ; Sat, 4 Jun 2011 19:18:11 +0000 (UTC) (envelope-from gabor@kovesdan.org) Received: from server.mypc.hu (server.mypc.hu [87.229.73.95]) by mx1.freebsd.org (Postfix) with ESMTP id 9FAA18FC0C for ; Sat, 4 Jun 2011 19:18:11 +0000 (UTC) Received: from server.mypc.hu (localhost [127.0.0.1]) by server.mypc.hu (Postfix) with ESMTP id 3C5D314E56E8 for ; Sat, 4 Jun 2011 21:18:09 +0200 (CEST) X-Virus-Scanned: amavisd-new at server.mypc.hu Received: from server.mypc.hu ([127.0.0.1]) by server.mypc.hu (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id S1KZ4ogp9vdd for ; Sat, 4 Jun 2011 21:18:06 +0200 (CEST) Received: from [193.137.158.131] (unknown [193.137.158.131]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by server.mypc.hu (Postfix) with ESMTPSA id D3B8714E56E0 for ; Sat, 4 Jun 2011 21:18:05 +0200 (CEST) Message-ID: <4DEA84F2.4040707@kovesdan.org> Date: Sat, 04 Jun 2011 20:18:10 +0100 From: Gabor Kovesdan User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; pt-PT; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: soc-status@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: regex status report #2 X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jun 2011 19:18:12 -0000 Hi, I've been testing more the code and I found a bug that in my code concerning REG_STARTEND that I added last week. I fixed it. I looked at NetBSD's code to see if they have any local improvements. I've merged REG_PEND support but they don't have anything else. In general, I see that TRE is a mature regex implementation with good POSIX-conformance but its performance is not always satisfying. So I've used gprof to check where the processing time is spent and I'll continue investigationg on how to improve the performance. Basically, there are two ways; - improving the TRE matching code itself - using heuristics and shortcuts; e.g. use fixed string matching to detect possibly matching context or detecting if the pattern is simple and can use a faster algorithm instead of heavy-weight pattern matching. Gabor