From owner-svn-src-user@FreeBSD.ORG Fri Dec 23 14:39:31 2011 Return-Path: Delivered-To: svn-src-user@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92004106566C; Fri, 23 Dec 2011 14:39:31 +0000 (UTC) (envelope-from gabor@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 80B7E8FC08; Fri, 23 Dec 2011 14:39:31 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.4/8.14.4) with ESMTP id pBNEdVCc071005; Fri, 23 Dec 2011 14:39:31 GMT (envelope-from gabor@svn.freebsd.org) Received: (from gabor@localhost) by svn.freebsd.org (8.14.4/8.14.4/Submit) id pBNEdVI7071003; Fri, 23 Dec 2011 14:39:31 GMT (envelope-from gabor@svn.freebsd.org) Message-Id: <201112231439.pBNEdVI7071003@svn.freebsd.org> From: Gabor Kovesdan Date: Fri, 23 Dec 2011 14:39:31 +0000 (UTC) To: src-committers@freebsd.org, svn-src-user@freebsd.org X-SVN-Group: user MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r228842 - user/gabor/tre-integration/lib/libc/regex X-BeenThere: svn-src-user@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the experimental " user" src tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2011 14:39:31 -0000 Author: gabor Date: Fri Dec 23 14:39:30 2011 New Revision: 228842 URL: http://svn.freebsd.org/changeset/base/228842 Log: - Minor rewording of some existing parts - Document some TRE-specific features Modified: user/gabor/tre-integration/lib/libc/regex/re_format.7 Modified: user/gabor/tre-integration/lib/libc/regex/re_format.7 ============================================================================== --- user/gabor/tre-integration/lib/libc/regex/re_format.7 Fri Dec 23 13:50:33 2011 (r228841) +++ user/gabor/tre-integration/lib/libc/regex/re_format.7 Fri Dec 23 14:39:30 2011 (r228842) @@ -37,7 +37,7 @@ .\" @(#)re_format.7 8.3 (Berkeley) 3/20/94 .\" $FreeBSD$ .\" -.Dd October 6, 2011 +.Dd December 23, 2011 .Dt RE_FORMAT 7 .Os .Sh NAME @@ -69,13 +69,13 @@ so this manual will describe the behavio instead of just reproducing the same iformation that is already available in the standard. .Pp -An extended regular expression is one or more non-empty +An extended regular expression is constructed from one or more non-empty .Em branches , separated by .Ql \&| . It matches anything that matches one of the branches. .Pp -A branch is one or more +A branch consists of one or more .Em pieces , concatenated. It matches a match for the first, followed by a match for the second, etc. @@ -284,7 +284,7 @@ The reverse, matching any character that class, the negation operator of bracket expressions may be used: .Ql [^[:class:]] . .Pp -In the event that a regular expression could match more than one +In the event that a regular expression could match more than one substring of a given string, the regular expression matches the one starting earliest in the string. If the regular expression could match more than one substring starting @@ -343,7 +343,77 @@ longer than 256 bytes, as an implementation can refuse to accept such regular expressions and remain POSIX-compliant. .Pp +As described before, +repetition operators and bounds are greedy by definition. +This implementation provides non-greedy operators and bounds that +are formed by adding an extra +.Ql \&? +after the repetition. +.No e.g. Ql a*? +will be non-greedy, +that is, +will match as few characters as possible. +.Pp +Another extension in this implementation is the set of non-standard +anchors: +.Bl -tag -width BBBB +.It Ql \e< +Beginning of a word +.It Ql \e> +End of a word +.It Ql \eb +Word boundary +.It Ql \eB +Non-word boundary +.It Ql \ed +Digit (equivalent to [[:digit:]]) +.It Ql \eD +Non-digit (equivalent to [^[:digit:]]) +.It Ql \es +Space (equivalent to [[:space:]]) +.It Ql \eS +Non-space (equivalent to [^[:space:]]) +.It Ql \ew +Word character (equivalent to [[:alnum]]) +.It Ql \eW +Non-word character (equivalent to [^[:alnum]]) +.El +.Pp +The literal characters can also be expressed with an extended notation +apart from real literals and escaped specials. +It is possible to specify 8\-bit hexadecimal encoded characters +.No e.g. \ex1B +or wide hexadecimal encoded characters +.No e.g. \ex{263a} . +With this notation, +every character can be included in a regular expression. +Some common non\-printable characters have an escaped shorthand, +as well: +.Bl -tag -width BBBB +.It Ql \ea +Bell character (ASCII code 7) +.It Ql \ee +Escape character (ASCII code 27) +.It Ql \ef +Form\-feed character (ASCII code 12) +.It Ql \en +Newline character (ASCII code 10) +.It Ql \er +Carriage return character (ASCII code 13) +.It Ql \et +Horizontal tab character (ASCII code 9) +.El +.Pp Basic regular expressions differ in several respects. +The delimiters for bounds are +.Ql \e{ +and +.Ql \e} , +with +.Ql \&{ +and +.Ql \&} +by themselves ordinary characters. .Ql \&| is an ordinary character and there is no equivalent for its functionality. @@ -352,23 +422,14 @@ and .Ql ?\& are ordinary characters, and their functionality can be expressed using bounds -.No ( Ql {1,} +.No ( Ql \e{1,\e} or -.Ql {0,1} +.Ql \e{0,1\e} respectively). Also note that .Ql x+ in extended regular expressions is equivalent to .Ql xx* . -The delimiters for bounds are -.Ql \e{ -and -.Ql \e} , -with -.Ql \&{ -and -.Ql \&} -by themselves ordinary characters. The parentheses for nested subexpressions are .Ql \e( and @@ -426,6 +487,8 @@ This manual was originally written by for an older implementation and later extended and tailored for TRE by .An Gabor Kovesdan . +The description of TRE\-specific extensions is based on the original +TRE documentation. The regex implementation comes from the TRE project and it was included first in .Fx 10-CURRENT.