Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 Jun 2020 09:40:57 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Kyle Evans <kevans@freebsd.org>
Cc:        "Rodney W. Grimes" <rgrimes@freebsd.org>, src-committers <src-committers@freebsd.org>,  svn-src-all <svn-src-all@freebsd.org>, svn-src-head <svn-src-head@freebsd.org>
Subject:   Re: svn commit: r361884 - in head/usr.bin/sed: . tests
Message-ID:  <CANCZdfrO%2B1ibwPCijr3Zq0YO80qEurMfC9p1fPRA0xGASvUucg@mail.gmail.com>
In-Reply-To: <CACNAnaFXpq-o_sOppAupFMN3aZo04LTGhRwXdJnnztB4i7XW3w@mail.gmail.com>
References:  <202006070432.0574Wc1L063319@repo.freebsd.org> <202006071331.057DV4Vo040383@gndrsh.dnsmgr.net> <CACNAnaFXpq-o_sOppAupFMN3aZo04LTGhRwXdJnnztB4i7XW3w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jun 7, 2020, 8:04 AM Kyle Evans <kevans@freebsd.org> wrote:

> On Sun, Jun 7, 2020 at 8:31 AM Rodney W. Grimes
> <freebsd@gndrsh.dnsmgr.net> wrote:
> >
> > > Author: kevans
> > > Date: Sun Jun  7 04:32:38 2020
> > > New Revision: 361884
> > > URL: https://svnweb.freebsd.org/changeset/base/361884
> > >
> > > Log:
> > >   sed: attempt to learn about hex escapes (e.g. \x27)
> > >
> > >   Somewhat predictably, software often wants to use \x27/\x24 among
> others so
> > >   that they can decline worrying about ugly escaping, if said escaping
> is even
> > >   possible. Right now, this software is using these and getting the
> wrong
> > >   results, as we'll interpret those as x27 and x24 respectively. Some
> examples
> > >   of this, when an exp-run was ran, were science/octopus and misc/vifm.
> > >
> > >   Go ahead and process these at all times.  We allow either one or two
> digits,
> > >   and the tests account for both.  If extra digits are specified, e.g.
> \x2727,
> > >   then the third and fourth digits are interpreted literally as one
> might
> > >   expect.
> >
> > Does it work to do \\x27, ie I want it to NOT do \x27 so I can sed
> > on files that contain sequences of escapes.
>
> I'm so glad you asked this. :-) For your immediate answer: yes, the
> semantics there work as you expect.
>
> For the long answer, that's actually what you should have been doing
> all along; raising awareness of that fact is what PR 229925 aims to
> do, by switching our interpretation of the UB for escaping ordinary
> characters to make them an error if it's not specially interpreted.
>
> Prior to this change, if you had:
>
> printf "\\\\x27\n" | sed -e 's/\x27//'
>
> What you end up with is actually *not* an empty string with a newline,
> but just a single backslash! \x27 in the replacement pattern gets
> passed through to the underlying regex(3) implementation, which then
> happily interprets \x => x and replaces the literal 'x27', leaving \
> -- which is perhaps not what you might have expected if \x27 didn't
> have special meaning and it almost certainly isn't what you wanted.
> With the new sed, you can change 'x27' to 'b27' in both strings above
> to see what I mean.
>
> In the New World Order, all regex(3) users will be forced to be
> precise here so that we don't get it wrong. This is especially
> important when I add GNU extensions to libregex, because some of those
> escaped-ordinaries will now be granted special meaning, so \s will no
> longer match a literal s but instead [[:space:]]; using the
> unadulterated libc regex(3) interface instead will give you an error
> and allow you to detect whether you're accidentally using libc
> regex(3) rather than the GNU-extended libregex.
>
> This is going to be a large and potentially world-breaking change for
> many, but I think we'll all be better for it in the end. The symbol
> version of regcomp will get bumped, so that older binaries will
> continue to operate with the old escaping behavior in case that was
> actually pertinent to their functionality.
>

Thanks for taking this on. We are actually stuck between two POLAs here:
existing behavior and what users of other systems expect on FreeBSD. Given
how edge-Casey the breakage will be, I'm glad you've decided to try full
new semantics. I've had *LOTS* of code I've downloaded that I had to hack
sed to be gsed for exactly this reason. I think it is one area we've failed
to keep up. It's an area where the anti linux bias of the project's early
days is hurting us now. Thanks for seeing how feasible this is and retiring
this technical debt.

Warner



Thanks,
>
> Kyle Evans
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrO%2B1ibwPCijr3Zq0YO80qEurMfC9p1fPRA0xGASvUucg>