Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 07 Apr 2017 04:05:01 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 166861] bsdgrep(1)/sed(1): bsdgrep -E and sed handle invalid {} constructs strangely
Message-ID:  <bug-166861-8-pLCxY0ammm@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-166861-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-166861-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D166861

Kyle Evans <bsdports@kyle-evans.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bsdports@kyle-evans.net

--- Comment #2 from Kyle Evans <bsdports@kyle-evans.net> ---
(In reply to dubiousjim from comment #1)

Summary of work needed at the bottom, feel free to skip ahead and only look
back for intermediate results/notes.

Some relevant notes:

As of GNU grep 2.27, GNU SED 4.3 on Debian, and BSD grep @ r316566-ish:

(1) and (2) behavior between the two seem to match
(3)=20
FreeBSD:
$ echo "a{1,2,3}b" | sed -r "s/{/_/"
a_1,2,3}b
$ echo "a{1,2,3}b" | sed -r "s/}/_/"
a{1,2,3_b

Debian:
$ echo "a{1,2,3}b" | sed -r "s/{/_/"
# Error, invalid preceding expression
# Whoops
$ echo "a{1,2,3}b" | sed -r "s/a{/_/"
# Error, unmatched \{
$ echo "a{1,2,3}b" | sed -r "s/}/_/"
a{1,2,3_b

We do have a test case for this at lib/libc/regex/grot/tests:205 where { is
explicitly meant to be a literal match in both BREs and EREs. We have no ca=
se
expression } being a literal match.

FreeBSD:
$ echo "a{1,2,3}b" | sed "s/\}/_/"
# Error, parentheses not balanced

Debian:
$ echo "a{1,2,3}b" | sed "s/\}/_/"
a{1,2,3_b
# Ah, also prefer GNU behavior

This one, it's worth noting, has no test either. It does have the obvious t=
est
for the other side, \{ alone, but no \}.

(4)
FreeBSD:
$ echo "a{1,2,3}b" | sed -r "s/{}/_/"
a{1,2,3}b

Debian:
$ echo "a{1,2,3}b" | sed -r "s/{}/_/"
# Error, invalid preceding expression
# Whoops
$ echo "a{1,2,3}b" | sed -r "s/a{}/_/"
# Error, invalid content
# Reasonable

This one is .... technically correct behavior. Technically, according to
re_format(7), the following "}" is *not* a digit, and therefore this is not=
 a
bounds statement. I think this is really not correct, though. Letting {} ta=
ke a
literal interpretation leaves us too much room for error getting in if a di=
git
was expected by the pattern-creator, and I would prefer the GNU approach on
this matter.

We'll probably want to update re_format(7) to be more explicit in this matt=
er,
as well as add a corresponding test case.

(5)
FreeBSD:
$ echo "a{1,2,3}b" | sed -r "s/)/_/"
a{1,2,3}b
$ echo "a{1,2,3}b" | sed "s/\)/_/"
# Error, parentheses not balanced

This is clearly covered in tests:54 (silenced, though) and with slight anger
expressed in the context around it. I lean towards taking the GNU/sane appr=
oach
on this one and making this work as one probably expects nowadays.


=3D=3D=3D=3D=3D Summary of work needed

(3)
Problem: { in ERE uses literal interpretation
Needed: { throw error
Needed: Fix test case at tests:205 to separate out BRE and ERE cases and ad=
just
ERE case to meet expectations

Problem: \} in BRE throws an error
Needed: \} match literal


(4)
Problem: {} in ERE uses literal interpretation
Needed: {} throw error
Needed: Consider re_format(7) update to explicitly note {} as illegal
Needed: Test case


(5)
Problem: ) in ERE should throw error
Needed: ) throw error
Needed: Adjust test cases (tests:54)


I think that sums it up -- I'll take a look at these things in the next wee=
k or
so.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-166861-8-pLCxY0ammm>