Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Dec 2019 13:23:58 -0600
From:      Kyle Evans <kevans@freebsd.org>
Cc:        src-committers <src-committers@freebsd.org>, svn-src-all <svn-src-all@freebsd.org>,  svn-src-head <svn-src-head@freebsd.org>
Subject:   Re: svn commit: r355590 - in head/usr.bin/sed: . tests tests/regress.multitest.out
Message-ID:  <CACNAnaFjYEYecc-OUyMy_or6c2itsJa6gzr_YWo9Lxy=%2BvjNow@mail.gmail.com>
In-Reply-To: <201912101916.xBAJG0Lf080839@repo.freebsd.org>
References:  <201912101916.xBAJG0Lf080839@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 10, 2019 at 1:16 PM Kyle Evans <kevans@freebsd.org> wrote:
>
> Author: kevans
> Date: Tue Dec 10 19:16:00 2019
> New Revision: 355590
> URL: https://svnweb.freebsd.org/changeset/base/355590
>
> Log:
>   sed: process \r, \n, and \t
>
>   This is both reasonable and a common GNUism that a lot of ported software
>   expects.
>
>   Universally process \r, \n, and \t into carriage return, newline, and tab
>   respectively. Newline still doesn't function in contexts where it can't
>   (e.g. BRE), but we process it anyways rather than passing
>   UB \n (escaped ordinary) through to the underlying regex engine.
>

This part of the message is wrong -- it would pass just an ordinary
'n', rather than an escaped ordinary, and lead to potential
false-positives if you think you're matching on an embedded newline
but instead match on 'n'. Further, my reading of POSIX's statement on
this leads me to believe that we have to treat it as a newline rather
than embedding it as 'n' or escaped-'n' which regex(3) will certainly
not interpret as a newline.

>   Adding a --posix flag to disable these was considered, but sed.1 already
>   declares this version of sed a super-set of POSIX specification and this
>   behavior is the most likely expected when one attempts to use one of these
>   escape sequences in pattern space.
>
>   This differs from pre-r197362 behavior in that we now honor the three
>   arguably most common escape sequences used with sed(1) and we do so outside
>   of character classes, too.
>
>   Other escape sequences, like \s and \S, will come later when GNU extensions
>   are added to libregex; sed will likely link against libregex by default,
>   since the GNU extensions tend to be fairly un-intrusive.
>
>   PR:           229925
>   Reviewed by:  bapt, emaste, pfg
>   Differential Revision:        https://reviews.freebsd.org/D22750
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACNAnaFjYEYecc-OUyMy_or6c2itsJa6gzr_YWo9Lxy=%2BvjNow>