Date: Wed, 25 Sep 2024 20:44:11 +0000 From: bugzilla-noreply@freebsd.org To: standards@FreeBSD.org Subject: [Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7) Message-ID: <bug-281710-99-YmrxvlrdWG@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-281710-99@https.bugs.freebsd.org/bugzilla/> References: <bug-281710-99@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D281710 --- Comment #8 from commit-hook@FreeBSD.org --- A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3D4f4860c9b07cc10cb6acbe6fbd71db45e= 344d2e6 commit 4f4860c9b07cc10cb6acbe6fbd71db45e344d2e6 Author: Bill Sommerfeld <sommerfeld@hamachi.org> AuthorDate: 2023-12-21 03:46:14 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2024-09-25 20:42:25 +0000 regex: mixed sets are misidentified as singletons Fix "singleton" function used by regcomp() to turn character set matches into exact character matches if a character set has exactly one element. The underlying cset representation is complex; most critically it records"small" characters (codepoint less than either 128 or 256 depending on locale) in a bit vector, and "wide" characters in a secondary array. Unfortunately the "singleton" function uses to identify singleton sets treated a cset as a singleton if either the "small" or the "wide" sets had exactly one element (it would then ignore the other set). The easiest way to demonstrate this bug: $ export LANG=3DC.UTF-8 $ echo 'a' | grep '[ab=C3=A0]' It should match (and print "a") but instead it doesn't match because the single accented character in the set is misinterpreted as a singleton. PR: 281710 Reviewed by: kevans, yuripv Obtained from: illumos (cherry picked from commit 8f7ed58a15556bf567ff876e1999e4fe4d684e1d) lib/libc/regex/regcomp.c | 25 ++++++++++++++++++----- lib/libc/tests/regex/multibyte.sh | 43 +++++++++++++++++++++++++++++++++++= +++- 2 files changed, 62 insertions(+), 6 deletions(-) --=20 You are receiving this mail because: You are on the CC list for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-281710-99-YmrxvlrdWG>