Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Sep 2024 20:44:11 +0000
From:      bugzilla-noreply@freebsd.org
To:        standards@FreeBSD.org
Subject:   [Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)
Message-ID:  <bug-281710-99-YmrxvlrdWG@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-281710-99@https.bugs.freebsd.org/bugzilla/>
References:  <bug-281710-99@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D281710

--- Comment #8 from commit-hook@FreeBSD.org ---
A commit in branch stable/14 references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=3D4f4860c9b07cc10cb6acbe6fbd71db45e=
344d2e6

commit 4f4860c9b07cc10cb6acbe6fbd71db45e344d2e6
Author:     Bill Sommerfeld <sommerfeld@hamachi.org>
AuthorDate: 2023-12-21 03:46:14 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2024-09-25 20:42:25 +0000

    regex: mixed sets are misidentified as singletons

    Fix "singleton" function used by regcomp() to turn character set matches
    into exact character matches if a character set has exactly one
    element.

    The underlying cset representation is complex; most critically it
    records"small" characters (codepoint less than either 128
    or 256 depending on locale) in a bit vector, and "wide" characters in
    a secondary array.

    Unfortunately the "singleton" function uses to identify singleton sets
    treated a cset as a singleton if either the "small" or the "wide" sets
    had exactly one element (it would then ignore the other set).

    The easiest way to demonstrate this bug:

            $ export LANG=3DC.UTF-8
            $ echo 'a' | grep '[ab=C3=A0]'

    It should match (and print "a") but instead it doesn't match because the
    single accented character in the set is misinterpreted as a singleton.

    PR:             281710
    Reviewed by:    kevans, yuripv
    Obtained from:  illumos

    (cherry picked from commit 8f7ed58a15556bf567ff876e1999e4fe4d684e1d)

 lib/libc/regex/regcomp.c          | 25 ++++++++++++++++++-----
 lib/libc/tests/regex/multibyte.sh | 43 +++++++++++++++++++++++++++++++++++=
+++-
 2 files changed, 62 insertions(+), 6 deletions(-)

--=20
You are receiving this mail because:
You are on the CC list for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-281710-99-YmrxvlrdWG>