From nobody Wed Sep 25 20:44:11 2024 X-Original-To: standards@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XDTGz3wHyz5XhKS for ; Wed, 25 Sep 2024 20:44:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XDTGz2vCLz57yY for ; Wed, 25 Sep 2024 20:44:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1727297051; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=msKA3QL7kKUGag2Bwd9qGiOKT0davWYCwnT2zCQI8U8=; b=FrRGIDQNlJ73BarydRVRB1jTY/EPJTLTx4MyDE7/k/cQG3uITDsrncLcVKNkFnhcaP2zLw FaoiTnmKQokZeiTg2iMeW+FOlsIatdpHXN+skUzt3CmQyUaM5Es5TZdcqtaWYkddOKalWr aFgqx3Oh6NvMXMw0CMo0ZcRUwcMH4k1BeYGDlJYc3Tf/2WMzZabOwBp5grRswu33euCxMg pYhkSs4WZPOyLAqB9DrPIK+4aIo/xOgponT9ZknBb4YknP07S9QUwLotdqRd3KfR0BCraV TzlbnHuWzqy/aVPqyKTm+45XcRowO0PHFP7jKgVBoW9OzVb2ktKWNcYiATPaoA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1727297051; a=rsa-sha256; cv=none; b=VzCv9pQzOWufNPNkr2A23zS9KwMD0CXt1kSsRaaMzBaIISpMnpd6N13fyFW1cGxVrHWWJ4 KGStaTzcpl2d6tGu0hO3rSrCHn2bE+TATN0ZCLFNnpO8JqVvN1HCJ7cpW53SqQ0Ub+yIpy gF5AzXHSgiI2UkR8dx8WsYh7quSPyWBZ4ivHS/HGfS9hzcXFTICPCsVLg1mparKhy6r/R1 4f8N35es0F5bT6RJ9PF/NfEzQIGvU2BeD1DMns3aTmHCQGUh1/eg8Jy/XkTi+G1ciGx2zl +4W6O5pxZvQiy5EW077A4bNZ32Uf7Pn0nNLyeqyYLO4vmLdkk7fjbckrcQo0aw== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4XDTGz2VzRzRgB for ; Wed, 25 Sep 2024 20:44:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 48PKiBr3039837 for ; Wed, 25 Sep 2024 20:44:11 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from bugzilla@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 48PKiBdZ039836 for standards@FreeBSD.org; Wed, 25 Sep 2024 20:44:11 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: bugzilla set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: standards@FreeBSD.org Subject: [Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7) Date: Wed, 25 Sep 2024 20:44:11 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: standards X-Bugzilla-Version: 14.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: commit-hook@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: kevans@freebsd.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Standards compliance List-Archive: https://lists.freebsd.org/archives/freebsd-standards List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-standards@freebsd.org Sender: owner-freebsd-standards@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D281710 --- Comment #8 from commit-hook@FreeBSD.org --- A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3D4f4860c9b07cc10cb6acbe6fbd71db45e= 344d2e6 commit 4f4860c9b07cc10cb6acbe6fbd71db45e344d2e6 Author: Bill Sommerfeld AuthorDate: 2023-12-21 03:46:14 +0000 Commit: Kyle Evans CommitDate: 2024-09-25 20:42:25 +0000 regex: mixed sets are misidentified as singletons Fix "singleton" function used by regcomp() to turn character set matches into exact character matches if a character set has exactly one element. The underlying cset representation is complex; most critically it records"small" characters (codepoint less than either 128 or 256 depending on locale) in a bit vector, and "wide" characters in a secondary array. Unfortunately the "singleton" function uses to identify singleton sets treated a cset as a singleton if either the "small" or the "wide" sets had exactly one element (it would then ignore the other set). The easiest way to demonstrate this bug: $ export LANG=3DC.UTF-8 $ echo 'a' | grep '[ab=C3=A0]' It should match (and print "a") but instead it doesn't match because the single accented character in the set is misinterpreted as a singleton. PR: 281710 Reviewed by: kevans, yuripv Obtained from: illumos (cherry picked from commit 8f7ed58a15556bf567ff876e1999e4fe4d684e1d) lib/libc/regex/regcomp.c | 25 ++++++++++++++++++----- lib/libc/tests/regex/multibyte.sh | 43 +++++++++++++++++++++++++++++++++++= +++- 2 files changed, 62 insertions(+), 6 deletions(-) --=20 You are receiving this mail because: You are on the CC list for the bug.=