Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 May 2016 15:35:23 +0000 (UTC)
From:      "Pedro F. Giffuni" <pfg@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r300683 - head/lib/libc/regex
Message-ID:  <201605251535.u4PFZNMv073946@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: pfg
Date: Wed May 25 15:35:23 2016
New Revision: 300683
URL: https://svnweb.freebsd.org/changeset/base/300683

Log:
  libc: regexec(3) adjustment.
  
  Change the behavior of when REG_STARTEND is combined with REG_NOTBOL.
  
  From the original posting[1]:
  
  "Enable the assumption that pmatch[0].rm_so is a continuation offset
  to  a string and allows us to do a proper assessment of the character
  in  regards to it's word position ('^' or '\<'), without risking going
  into unallocated memory."
  
  This change makes us similar to how glibc handles REG_STARTEND |
  REG_NOTBOL, and is closely related to a soon-to-land fix to sed.
  
  Special thanks to Martijn van Duren and Ingo Schwarze for working
  out some consistent behaviour.
  
  Differential Revision:	https://reviews.freebsd.org/D6257
  Taken from:	openbsd-tech 2016-05-24 [1]  (Martijn van Duren)
  Relnotes:	yes
  MFC after:	1 month

Modified:
  head/lib/libc/regex/engine.c
  head/lib/libc/regex/regex.3

Modified: head/lib/libc/regex/engine.c
==============================================================================
--- head/lib/libc/regex/engine.c	Wed May 25 15:10:07 2016	(r300682)
+++ head/lib/libc/regex/engine.c	Wed May 25 15:35:23 2016	(r300683)
@@ -786,7 +786,7 @@ fast(	struct match *m,
 	ASSIGN(fresh, st);
 	SP("start", st, *p);
 	coldp = NULL;
-	if (start == m->beginp)
+	if (start == m->offp || (start == m->beginp && !(m->eflags&REG_NOTBOL)))
 		c = OUT;
 	else {
 		/*
@@ -891,7 +891,7 @@ slow(	struct match *m,
 	SP("sstart", st, *p);
 	st = step(m->g, startst, stopst, st, NOTHING, st);
 	matchp = NULL;
-	if (start == m->beginp)
+	if (start == m->offp || (start == m->beginp && !(m->eflags&REG_NOTBOL)))
 		c = OUT;
 	else {
 		/*

Modified: head/lib/libc/regex/regex.3
==============================================================================
--- head/lib/libc/regex/regex.3	Wed May 25 15:10:07 2016	(r300682)
+++ head/lib/libc/regex/regex.3	Wed May 25 15:35:23 2016	(r300683)
@@ -32,7 +32,7 @@
 .\"	@(#)regex.3	8.4 (Berkeley) 3/20/94
 .\" $FreeBSD$
 .\"
-.Dd August 17, 2005
+.Dd May 25, 2016
 .Dt REGEX 3
 .Os
 .Sh NAME
@@ -235,11 +235,16 @@ The
 argument is the bitwise OR of zero or more of the following flags:
 .Bl -tag -width REG_STARTEND
 .It Dv REG_NOTBOL
-The first character of
-the string
-is not the beginning of a line, so the
-.Ql ^\&
-anchor should not match before it.
+The first character of the string is treated as the continuation
+of a line.
+This means that the anchors
+.Ql ^\& ,
+.Ql [[:<:]] ,
+and
+.Ql \e<
+do not match before it; but see
+.Dv REG_STARTEND
+below.
 This does not affect the behavior of newlines under
 .Dv REG_NEWLINE .
 .It Dv REG_NOTEOL
@@ -247,19 +252,16 @@ The NUL terminating
 the string
 does not end a line, so the
 .Ql $\&
-anchor should not match before it.
+anchor does not match before it.
 This does not affect the behavior of newlines under
 .Dv REG_NEWLINE .
 .It Dv REG_STARTEND
 The string is considered to start at
-.Fa string
-+
-.Fa pmatch Ns [0]. Ns Va rm_so
-and to have a terminating NUL located at
-.Fa string
-+
-.Fa pmatch Ns [0]. Ns Va rm_eo
-(there need not actually be a NUL at that location),
+.Fa string No +
+.Fa pmatch Ns [0]. Ns Fa rm_so
+and to end before the byte located at
+.Fa string No +
+.Fa pmatch Ns [0]. Ns Fa rm_eo ,
 regardless of the value of
 .Fa nmatch .
 See below for the definition of
@@ -271,13 +273,37 @@ compatible with but not specified by
 .St -p1003.2 ,
 and should be used with
 caution in software intended to be portable to other systems.
-Note that a non-zero
-.Va rm_so
-does not imply
-.Dv REG_NOTBOL ;
-.Dv REG_STARTEND
-affects only the location of the string,
-not how it is matched.
+.Pp
+Without
+.Dv REG_NOTBOL ,
+the position
+.Fa rm_so
+is considered the beginning of a line, such that
+.Ql ^
+matches before it, and the beginning of a word if there is a word
+character at this position, such that
+.Ql [[:<:]]
+and
+.Ql \e<
+match before it.
+.Pp
+With
+.Dv REG_NOTBOL ,
+the character at position
+.Fa rm_so
+is treated as the continuation of a line, and if
+.Fa rm_so
+is greater than 0, the preceding character is taken into consideration.
+If the preceding character is a newline and the regular expression was compiled
+with
+.Dv REG_NEWLINE ,
+.Ql ^
+matches before the string; if the preceding character is not a word character
+but the string starts with a word character,
+.Ql [[:<:]]
+and
+.Ql \e<
+match before the string.
 .El
 .Pp
 See



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201605251535.u4PFZNMv073946>