From owner-svn-src-user@FreeBSD.ORG Wed Sep 14 21:35:41 2011 Return-Path: Delivered-To: svn-src-user@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1ACC2106564A; Wed, 14 Sep 2011 21:35:41 +0000 (UTC) (envelope-from minimarmot@gmail.com) Received: from mail-gw0-f44.google.com (mail-gw0-f44.google.com [74.125.83.44]) by mx1.freebsd.org (Postfix) with ESMTP id A96F18FC08; Wed, 14 Sep 2011 21:35:40 +0000 (UTC) Received: by gwb20 with SMTP id 20so1160029gwb.17 for ; Wed, 14 Sep 2011 14:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=7P/qRF+z4I+K/tQayYxubzWCCug+rD8+LB94x3hNAgg=; b=CG7JKbnb9LqzErCk7vMVxIFHlNwjtC0cZiH7VnnB9Y8Jf81OmeMuhby2268RavMSi8 ogPRRht9wxAU2nTaL/cCuYAI0eYoAKHPyeNfZl99pClbnkD59p+lsx/lYzaMDcbBKY2R ZUOTQjGrmCSE1aZrYEd5dYwg0i+EOpUjtGj1Q= MIME-Version: 1.0 Received: by 10.236.187.70 with SMTP id x46mr2092821yhm.71.1316036139818; Wed, 14 Sep 2011 14:35:39 -0700 (PDT) Received: by 10.236.111.17 with HTTP; Wed, 14 Sep 2011 14:35:39 -0700 (PDT) In-Reply-To: <201109142108.p8EL82vN042595@svn.freebsd.org> References: <201109142108.p8EL82vN042595@svn.freebsd.org> Date: Wed, 14 Sep 2011 17:35:39 -0400 Message-ID: From: Ben Kaduk To: Gabor Kovesdan Content-Type: text/plain; charset=ISO-8859-1 Cc: src-committers@freebsd.org, svn-src-user@freebsd.org Subject: Re: svn commit: r225561 - user/gabor/tre-integration/lib/libc/regex X-BeenThere: svn-src-user@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the experimental " user" src tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Sep 2011 21:35:41 -0000 On 9/14/11, Gabor Kovesdan wrote: > Author: gabor > Date: Wed Sep 14 21:08:02 2011 > New Revision: 225561 > URL: http://svn.freebsd.org/changeset/base/225561 > > Modified: user/gabor/tre-integration/lib/libc/regex/regex.3 > ============================================================================== > --- user/gabor/tre-integration/lib/libc/regex/regex.3 Wed Sep 14 20:13:10 > 2011 (r225560) > +++ user/gabor/tre-integration/lib/libc/regex/regex.3 Wed Sep 14 21:08:02 > 2011 (r225561) > @@ -62,24 +96,57 @@ > .Ft void > .Fn regfree "regex_t *preg" > .Sh DESCRIPTION > -These routines implement > +These routines implement pattern matchinf of "matching" > .St -p1003.2 > -regular expressions > -.Pq Do RE Dc Ns s ; > -see > -.Xr re_format 7 . > +regular expressions. > +The > +.Xr re_format 7 > +manual can be consulted for the syntax and use of these. s/the syntax and use of these/their syntax and usage/ is probably clearer. > +.Pp > The > .Fn regcomp > function > -compiles an RE written as a string into an internal form, > +compiles a regular expression written as a string into an internal form. > +The > +.Fn regncomp > +function works in the very same way, > +but takes another argument to specify the length of the pattern. > +This function can accept patterns with NUL bytes inside because. "can accept patterns that include NUL bytes." is probably enough. (The trailing "because" is very odd in technical writing.) > +The > +.Fn regwcomp > +and > +.Fn regwncomp > +functions work like the two former ones but take the pattern in > +the wide string form. > +.Pp > +The > .Fn regexec > -matches that internal form against a string and reports results, > -.Fn regerror > -transforms error codes from either into human-readable messages, > +function matches that internal form against a string and reports results. > +The > +.Fn regnexec > +function works in the same way but takes another argument to specify > +the length of the pattern, > +allowing NUL bytes in the input string. > +Besides, I would probably s/Besides/Additionally/ > +for long inputs strings it is more efficient to call this function if > +the length is already known beause it will not require the matcher to > +calculate the length and read the input bytes one by one. > +The > +.Fn regwexec > and > +.Fn regwnexec > +functions work like the two former ones but take the input as a > +wide string. > +.Pp > +The > +.Fn regerror > +function transforms error codes from the above functions into > +human-readable messages. > +.Pp > +The > .Fn regfree > -frees any dynamically-allocated storage used by the internal form > -of an RE. > +function frees any dynamically-allocated storage used by the internal form > +of a regular expression. > .Pp > The header > .In regex.h > @@ -127,31 +193,26 @@ to improve readability. > .It Dv REG_NOSPEC > Compile with recognition of all special characters turned off. > All characters are thus considered ordinary, > -so the > -.Dq RE > -is a literal string. > -This is an extension, > -compatible with but not specified by > -.St -p1003.2 , > -and should be used with > -caution in software intended to be portable to other systems. > -.Dv REG_EXTENDED > -and > +so the reqular expression is a literal string. > +.It Dv REG_LITERAL > +Synonim for "Synonym" > +.Dv REG_NOSPEC. > +.It Dv REG_EXTENDED > +may not be used together with > .Dv REG_NOSPEC > -may not be used > +or > +.Dv REG_LITERAL > in the same call to > .Fn regcomp . > .It Dv REG_ICASE > Compile for matching that ignores upper/lower case distinctions. > -See > -.Xr re_format 7 . > .It Dv REG_NOSUB > Compile for matching that need only report success or failure, > not what was matched. > .It Dv REG_NEWLINE > Compile for newline-sensitive matching. > By default, newline is a completely ordinary character with no special > -meaning in either REs or strings. > +meaning in either regular expressins or strings. > With this flag, > .Ql [^ > bracket expressions and > @@ -170,66 +231,79 @@ The regular expression ends, > not at the first NUL, > but just before the character pointed to by the > .Va re_endp > +or > +.Va re_wendp > member of the structure pointed to by > .Fa preg . > +The former is used for the functions that take a single- or multi-byte > +string, > +while the second is used for those taking a wide string. > The > .Va re_endp > member is of type > -.Ft "const char *" . > -This flag permits inclusion of NULs in the RE; > +.Ft "const char *" > +and the > +.Va re_wendp > +member is of type > +.Ft "const wchar_t *" . > +This flag permits inclusion of NULs in the regular expression; > they are considered ordinary characters. > -This is an extension, > -compatible with but not specified by > -.St -p1003.2 , > -and should be used with > -caution in software intended to be portable to other systems. > .El > .Pp > When successful, > +the > .Fn regcomp > -returns 0 and fills in the structure pointed to by > +family of functions returns > +.Dv REG_OK > +and fills in the structure pointed to by > .Fa preg . > -One member of that structure > -(other than > -.Va re_endp ) > -is publicized: > +The > .Va re_nsub , > -of type > +member of the structure of type > .Ft size_t , > -contains the number of parenthesized subexpressions within the RE > -(except that the value of this member is undefined if the > +contains the number of parenthesized subexpressions within the regular > +expression (except when the > .Dv REG_NOSUB > -flag was used). > +flag was used for the compilation of the pattern). > If > .Fn regcomp > fails, it returns a non-zero error code; > see > -.Sx DIAGNOSTICS . > +.Sx RETURN VALUES . > .Pp > The > .Fn regexec > -function > -matches the compiled RE pointed to by > +family of functions match the compiled regular expression pointed to by > .Fa preg > against the > -.Fa string , > +.Fa string > +(possibly having a length of > +.Fa len > +when using the variants that take the input length), > subject to the flags in > .Fa eflags , > -and reports results using > +and reports match through its return value. This is not quite grammatically correct. From just a cursory reading of the surrounding text, I'm not sure if it should be "a match" or "matches", though. > +The > .Fa nmatch , > .Fa pmatch , > -and the returned value. > -The RE must have been compiled by a previous invocation of > -.Fn regcomp . I think the commas need to disappear and an 'and' between the arguments be added? > +arguments are also filled in to hold submatches unless the pattern was > +compiled using the > +.Dv REG_NOSUB > +falg. "flag" > +The regular expression must have been compiled by a previous invocation of There's an extra space here. > +.Fn regcomp > +or any of its alternative forms. > The compiled form is not altered during execution of > -.Fn regexec , > -so a single compiled RE can be used simultaneously by multiple threads. > +.Fn regexec > +or its alternatives, > +so a single compiled regular expression can be used simultaneously by > +multiple threads, > +and it can be used with any variant of the > +.Fn regexec > +functions. > +(I.e. a multi-byte pattern can be matched to wide string input and > +vice versa.) > .Pp > -By default, > -the NUL-terminated string pointed to by > -.Fa string > -is considered to be the text of an entire line, minus any terminating > -newline. > The > .Fa eflags > argument is the bitwise OR of zero or more of the following flags: > @@ -278,22 +347,17 @@ does not imply > .Dv REG_STARTEND > affects only the location of the string, > not how it is matched. > -.El > .Pp > +The function indicates a match by returning > +.Dv REG_OK , > +no match with > +.Dv REG_NOMATCH , > +or returns an error code different from the above two values > +if an error has occured during the execution. > See > -.Xr re_format 7 > -for a discussion of what is matched in situations where an RE or a > -portion thereof could match any of several substrings of > -.Fa string . > -.Pp > -Normally, > -.Fn regexec > -returns 0 for success and the non-zero code > -.Dv REG_NOMATCH > -for failure. > -Other non-zero error codes may be returned in exceptional situations; > -see > -.Sx DIAGNOSTICS . > +.Sx RETURN VALUES > +for the detailed description of error codes. s/the/a/ would be slightly more correct. > +.El > .Pp > If > .Dv REG_NOSUB [...] > -REs are anchors, not ordinary characters. > -.Sh DIAGNOSTICS > -Non-zero error codes from > +thus all of them are thread-safe. > +.Sh RETURN VALUES > +Non-zero error codes from the > .Fn regcomp > and > .Fn regexec > +family of functions > include the following: > .Pp > .Bl -tag -width REG_ECOLLATE -compact > +.It Dv REG_OK > +Operation successfully executed. > +Synonim for 0, "Synonym" > +to provide better code readability. > .It Dv REG_NOMATCH > The > .Fn regexec > -function > -failed to match > +functions I think the singular "function" may actually still be right, here, since a single function is returning REG_NOMATCH at a time. > +failed to match. > .It Dv REG_BADPAT > -invalid regular expression > +Invalid regular expression. > +This implementation only returns this code when the regular expression > +passed to > +.Fn regcomp > +contains an illegal multibyte sequence. > .It Dv REG_ECOLLATE > -invalid collating element > +Invalid collating element. > +Returned whenever equivalence classes or multicharacter collating elements > +are used in a bracket expression. > +.Pq They are not supported yet. > .It Dv REG_ECTYPE > -invalid character class > +Invalid character class name. > .It Dv REG_EESCAPE > -.Ql \e > -applied to unescapable character > +The last character was a backslash. > .It Dv REG_ESUBREG > -invalid backreference number > +Invalid backreference number. > .It Dv REG_EBRACK > -brackets > +Brackets > .Ql "[ ]" > -not balanced > +not balanced. I might do "are not balanced", to have a verb in the sentence. It would need to happen in all the following, too, though. Thanks for updating the man page! -Ben Kaduk > .It Dv REG_EPAREN > -parentheses > +Parentheses > .Ql "( )" > -not balanced > +not balanced. > .It Dv REG_EBRACE > -braces > +Braces > .Ql "{ }" > -not balanced > +not balanced. > .It Dv REG_BADBR > -invalid repetition count(s) in > -.Ql "{ }" > +Invalid repetition count(s) in > +.Ql "{ }" : > +not a number, more than two numbers, first larger than second, or number > too large. > .It Dv REG_ERANGE > -invalid character range in > -.Ql "[ ]" > +Invalid character range in > +.Ql "[ ]" , > +i.e. ending point is earlier in the collating order than the starting > point. > .It Dv REG_ESPACE > -ran out of memory > +Out of memory. > .It Dv REG_BADRPT > -.Ql ?\& , > -.Ql *\& , > -or > -.Ql +\& > -operand invalid > -.It Dv REG_EMPTY > -empty (sub)expression > -.It Dv REG_ASSERT > -cannot happen - you found a bug > -.It Dv REG_INVARG > -invalid argument, e.g.\& negative-length string > -.It Dv REG_ILLSEQ > -illegal byte sequence (bad multibyte character) > +Invalid use of repetition operators: two or more repetition operators have > been > +chained in an undefined way. > .El > .Sh SEE ALSO > .Xr grep 1 ,