From owner-freebsd-bugs@FreeBSD.ORG Tue Dec 28 18:00:33 2010 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E0181065670 for ; Tue, 28 Dec 2010 18:00:33 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 116358FC29 for ; Tue, 28 Dec 2010 18:00:33 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oBSI0W8r080493 for ; Tue, 28 Dec 2010 18:00:32 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id oBSI0WhM080468; Tue, 28 Dec 2010 18:00:32 GMT (envelope-from gnats) Resent-Date: Tue, 28 Dec 2010 18:00:32 GMT Resent-Message-Id: <201012281800.oBSI0WhM080468@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Mathieu Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23C3D106566C for ; Tue, 28 Dec 2010 17:57:53 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (unknown [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 13AB08FC08 for ; Tue, 28 Dec 2010 17:57:53 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id oBSHvqx2022003 for ; Tue, 28 Dec 2010 17:57:52 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id oBSHvqcr022002; Tue, 28 Dec 2010 17:57:52 GMT (envelope-from nobody) Message-Id: <201012281757.oBSHvqcr022002@red.freebsd.org> Date: Tue, 28 Dec 2010 17:57:52 GMT From: Mathieu To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: bin/153502: regex(3) bug with UTF-8 locale X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Dec 2010 18:00:33 -0000 >Number: 153502 >Category: bin >Synopsis: regex(3) bug with UTF-8 locale >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Dec 28 18:00:32 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Mathieu >Release: 8.1-STABLE, 7.3-RELEASE-p3 >Organization: >Environment: 8.1-STABLE/amd64 r212312M 7.3-RELEASE-p3/i386 r215233M >Description: I'm seeing odd behavior from programs using regex(3) like less(1), vi(1) and sed(1) when using LANG=en_US.UTF-8 and UTF-8 inputs. Sometimes it seems to work right: $ echo 'é' | sed -ne '/^.$/p' é $ echo 'éé' | sed -ne '/^..$/p' éé $ echo 'aéa' | sed -ne '/a.a/p' aéa $ echo 'aéa' | sed -ne '/a.*a/p' aéa $ echo 'aaéaa' | sed -ne '/aa.aa/p' aaéaa $ echo 'aéaéa' | sed -ne '/a.a.a/p' aéaéa But not always: $ echo 'éa' | sed -ne '/.a/p' $ echo 'aéaa' | sed -ne '/a.aa/p' $ echo 'éaé' | sed -ne '/.a./p' Seems like using ".*", ".+", ".{0,}" or ".{1,}" works right, but ".{0,1}", ".{1,1}" or a lone "." doesn't always. >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: