From owner-freebsd-hackers  Wed Jan 29 16:29:59 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id QAA02886
          for hackers-outgoing; Wed, 29 Jan 1997 16:29:59 -0800 (PST)
Received: from fog.xinside.com (fog.xinside.com [199.164.187.39])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA02877
          for <hackers@freebsd.org>; Wed, 29 Jan 1997 16:29:55 -0800 (PST)
Received: (from smap@localhost) by fog.xinside.com (8.8.3/8.7.3) id RAA13970 for <hackers@freebsd.org>; Wed, 29 Jan 1997 17:29:54 -0700 (MST)
X-Authentication-Warning: fog.xinside.com: smap set sender to <patrick@chon.xinside.com> using -f
Received: from chon.xinside.com(199.164.187.134) by fog.xinside.com via smap (V1.3)
	id sma013968; Wed Jan 29 17:29:38 1997
Received: (from patrick@localhost) by chon.xinside.com (8.7.5/8.6.12) id RAA02431; Wed, 29 Jan 1997 17:29:19 -0700 (MST)
Date: Wed, 29 Jan 1997 17:29:19 -0700 (MST)
Message-Id: <199701300029.RAA02431@chon.xinside.com>
From: Patrick Giagnocavo <support@xinside.com>
To: hackers@freebsd.org
Subject: [Fwd: freebsd performance.]
In-Reply-To: <32EFD2E2.167EB0E7@whistle.com>
References: <32EFD2E2.167EB0E7@whistle.com>
Reply-To: patrick@xinside.com
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Julian Elischer writes:

 > The posix regex library is VERY VERY slow.
 > 
 > I have a program that uses a large regex to parse some input.
 > 
 > I have a version in perl and a version in C++ using the freebsd posix
 > regex library.
 > 
 > The perl version is 100X faster that the C++ version.
 > 
 > gprof on the C++ version shows 99% of the spend in:
 > 
 >  91.53     46.17    46.17  1152366     0.04     0.04  lstep
 >   6.52     49.46     3.29    98497     0.03     0.47  lslow

I am surprised that some of our more erudite members on the list have
not jumped on this.  So, a definitely non-erudite person will.

The best book to get on the subject is the newly released 'Mastering
Regular Expressions' by Jeffrey Freidl; it is published by O'Reilly
and Associates.

Posix regex is explained in the book; as well, there are a number of
tests, gambits, and special cases that can be used to substantially
speed up searches.

There are two different 'engines' - NFA (nondeterministic finite
automaton) and DFA (deterministic finite automaton).  Perl is
'traditional NFA' according to Mr. Friedl, while POSIX leans towards
DFA-like behavior in all cases (always returns 'leftmost-longest' that
matches - perl returns I believe the first part that matches).

Also, Perl does not 'do' POSIX IIRC; so the results can actually be
different when using the same regex string.  Perl does however have
some very powerful features for its regex - covered in the book. 

Anyways, get the book - it is well written and very informative.

Another book that covers this is the 'Red Dragon' compiler book by Aho
Sethi and Ullman.  Page 113 and following in the version I have.

Maybe this fellow should post all or part of his regex strings - there
may well be a way to optimize them.  I am sure that there is a way to
speed it up to get it faster.  

100X difference is just too great to say that it is the library, IMHO.

Cordially

Patrick Giagnocavo
patrick@xinside.com

-Opinions expressed are not even mine, let alone my employers'!-