From owner-svn-src-head@freebsd.org Wed May 25 15:35:24 2016 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7796CB4AAD9; Wed, 25 May 2016 15:35:24 +0000 (UTC) (envelope-from pfg@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4F87E12F5; Wed, 25 May 2016 15:35:24 +0000 (UTC) (envelope-from pfg@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id u4PFZNxI073948; Wed, 25 May 2016 15:35:23 GMT (envelope-from pfg@FreeBSD.org) Received: (from pfg@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id u4PFZNMv073946; Wed, 25 May 2016 15:35:23 GMT (envelope-from pfg@FreeBSD.org) Message-Id: <201605251535.u4PFZNMv073946@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: pfg set sender to pfg@FreeBSD.org using -f From: "Pedro F. Giffuni" Date: Wed, 25 May 2016 15:35:23 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r300683 - head/lib/libc/regex X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 May 2016 15:35:24 -0000 Author: pfg Date: Wed May 25 15:35:23 2016 New Revision: 300683 URL: https://svnweb.freebsd.org/changeset/base/300683 Log: libc: regexec(3) adjustment. Change the behavior of when REG_STARTEND is combined with REG_NOTBOL. From the original posting[1]: "Enable the assumption that pmatch[0].rm_so is a continuation offset to a string and allows us to do a proper assessment of the character in regards to it's word position ('^' or '\<'), without risking going into unallocated memory." This change makes us similar to how glibc handles REG_STARTEND | REG_NOTBOL, and is closely related to a soon-to-land fix to sed. Special thanks to Martijn van Duren and Ingo Schwarze for working out some consistent behaviour. Differential Revision: https://reviews.freebsd.org/D6257 Taken from: openbsd-tech 2016-05-24 [1] (Martijn van Duren) Relnotes: yes MFC after: 1 month Modified: head/lib/libc/regex/engine.c head/lib/libc/regex/regex.3 Modified: head/lib/libc/regex/engine.c ============================================================================== --- head/lib/libc/regex/engine.c Wed May 25 15:10:07 2016 (r300682) +++ head/lib/libc/regex/engine.c Wed May 25 15:35:23 2016 (r300683) @@ -786,7 +786,7 @@ fast( struct match *m, ASSIGN(fresh, st); SP("start", st, *p); coldp = NULL; - if (start == m->beginp) + if (start == m->offp || (start == m->beginp && !(m->eflags®_NOTBOL))) c = OUT; else { /* @@ -891,7 +891,7 @@ slow( struct match *m, SP("sstart", st, *p); st = step(m->g, startst, stopst, st, NOTHING, st); matchp = NULL; - if (start == m->beginp) + if (start == m->offp || (start == m->beginp && !(m->eflags®_NOTBOL))) c = OUT; else { /* Modified: head/lib/libc/regex/regex.3 ============================================================================== --- head/lib/libc/regex/regex.3 Wed May 25 15:10:07 2016 (r300682) +++ head/lib/libc/regex/regex.3 Wed May 25 15:35:23 2016 (r300683) @@ -32,7 +32,7 @@ .\" @(#)regex.3 8.4 (Berkeley) 3/20/94 .\" $FreeBSD$ .\" -.Dd August 17, 2005 +.Dd May 25, 2016 .Dt REGEX 3 .Os .Sh NAME @@ -235,11 +235,16 @@ The argument is the bitwise OR of zero or more of the following flags: .Bl -tag -width REG_STARTEND .It Dv REG_NOTBOL -The first character of -the string -is not the beginning of a line, so the -.Ql ^\& -anchor should not match before it. +The first character of the string is treated as the continuation +of a line. +This means that the anchors +.Ql ^\& , +.Ql [[:<:]] , +and +.Ql \e< +do not match before it; but see +.Dv REG_STARTEND +below. This does not affect the behavior of newlines under .Dv REG_NEWLINE . .It Dv REG_NOTEOL @@ -247,19 +252,16 @@ The NUL terminating the string does not end a line, so the .Ql $\& -anchor should not match before it. +anchor does not match before it. This does not affect the behavior of newlines under .Dv REG_NEWLINE . .It Dv REG_STARTEND The string is considered to start at -.Fa string -+ -.Fa pmatch Ns [0]. Ns Va rm_so -and to have a terminating NUL located at -.Fa string -+ -.Fa pmatch Ns [0]. Ns Va rm_eo -(there need not actually be a NUL at that location), +.Fa string No + +.Fa pmatch Ns [0]. Ns Fa rm_so +and to end before the byte located at +.Fa string No + +.Fa pmatch Ns [0]. Ns Fa rm_eo , regardless of the value of .Fa nmatch . See below for the definition of @@ -271,13 +273,37 @@ compatible with but not specified by .St -p1003.2 , and should be used with caution in software intended to be portable to other systems. -Note that a non-zero -.Va rm_so -does not imply -.Dv REG_NOTBOL ; -.Dv REG_STARTEND -affects only the location of the string, -not how it is matched. +.Pp +Without +.Dv REG_NOTBOL , +the position +.Fa rm_so +is considered the beginning of a line, such that +.Ql ^ +matches before it, and the beginning of a word if there is a word +character at this position, such that +.Ql [[:<:]] +and +.Ql \e< +match before it. +.Pp +With +.Dv REG_NOTBOL , +the character at position +.Fa rm_so +is treated as the continuation of a line, and if +.Fa rm_so +is greater than 0, the preceding character is taken into consideration. +If the preceding character is a newline and the regular expression was compiled +with +.Dv REG_NEWLINE , +.Ql ^ +matches before the string; if the preceding character is not a word character +but the string starts with a word character, +.Ql [[:<:]] +and +.Ql \e< +match before the string. .El .Pp See