From owner-freebsd-hackers Wed Jan 29 16:29:59 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id QAA02886 for hackers-outgoing; Wed, 29 Jan 1997 16:29:59 -0800 (PST) Received: from fog.xinside.com (fog.xinside.com [199.164.187.39]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA02877 for ; Wed, 29 Jan 1997 16:29:55 -0800 (PST) Received: (from smap@localhost) by fog.xinside.com (8.8.3/8.7.3) id RAA13970 for ; Wed, 29 Jan 1997 17:29:54 -0700 (MST) X-Authentication-Warning: fog.xinside.com: smap set sender to using -f Received: from chon.xinside.com(199.164.187.134) by fog.xinside.com via smap (V1.3) id sma013968; Wed Jan 29 17:29:38 1997 Received: (from patrick@localhost) by chon.xinside.com (8.7.5/8.6.12) id RAA02431; Wed, 29 Jan 1997 17:29:19 -0700 (MST) Date: Wed, 29 Jan 1997 17:29:19 -0700 (MST) Message-Id: <199701300029.RAA02431@chon.xinside.com> From: Patrick Giagnocavo To: hackers@freebsd.org Subject: [Fwd: freebsd performance.] In-Reply-To: <32EFD2E2.167EB0E7@whistle.com> References: <32EFD2E2.167EB0E7@whistle.com> Reply-To: patrick@xinside.com Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Julian Elischer writes: > The posix regex library is VERY VERY slow. > > I have a program that uses a large regex to parse some input. > > I have a version in perl and a version in C++ using the freebsd posix > regex library. > > The perl version is 100X faster that the C++ version. > > gprof on the C++ version shows 99% of the spend in: > > 91.53 46.17 46.17 1152366 0.04 0.04 lstep > 6.52 49.46 3.29 98497 0.03 0.47 lslow I am surprised that some of our more erudite members on the list have not jumped on this. So, a definitely non-erudite person will. The best book to get on the subject is the newly released 'Mastering Regular Expressions' by Jeffrey Freidl; it is published by O'Reilly and Associates. Posix regex is explained in the book; as well, there are a number of tests, gambits, and special cases that can be used to substantially speed up searches. There are two different 'engines' - NFA (nondeterministic finite automaton) and DFA (deterministic finite automaton). Perl is 'traditional NFA' according to Mr. Friedl, while POSIX leans towards DFA-like behavior in all cases (always returns 'leftmost-longest' that matches - perl returns I believe the first part that matches). Also, Perl does not 'do' POSIX IIRC; so the results can actually be different when using the same regex string. Perl does however have some very powerful features for its regex - covered in the book. Anyways, get the book - it is well written and very informative. Another book that covers this is the 'Red Dragon' compiler book by Aho Sethi and Ullman. Page 113 and following in the version I have. Maybe this fellow should post all or part of his regex strings - there may well be a way to optimize them. I am sure that there is a way to speed it up to get it faster. 100X difference is just too great to say that it is the library, IMHO. Cordially Patrick Giagnocavo patrick@xinside.com -Opinions expressed are not even mine, let alone my employers'!-