From owner-freebsd-ports Thu Apr 18 23:20: 8 2002 Delivered-To: freebsd-ports@hub.freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id EA5C437B41B for ; Thu, 18 Apr 2002 23:20:01 -0700 (PDT) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.6/8.11.6) id g3J6K1p30357; Thu, 18 Apr 2002 23:20:01 -0700 (PDT) (envelope-from gnats) Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 94BD537B416 for ; Thu, 18 Apr 2002 23:18:46 -0700 (PDT) Received: (from nobody@localhost) by freefall.freebsd.org (8.11.6/8.11.6) id g3J6Ik430208; Thu, 18 Apr 2002 23:18:46 -0700 (PDT) (envelope-from nobody) Message-Id: <200204190618.g3J6Ik430208@freefall.freebsd.org> Date: Thu, 18 Apr 2002 23:18:46 -0700 (PDT) From: Wesley Irish To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-1.0 Subject: ports/37241: character ranges in regular expressions in nawk match one beyond the given range Sender: owner-freebsd-ports@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Number: 37241 >Category: ports >Synopsis: character ranges in regular expressions in nawk match one beyond the given range >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-ports >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Apr 18 23:20:01 PDT 2002 >Closed-Date: >Last-Modified: >Originator: Wesley Irish >Release: 4.4-RELEASE >Organization: Coyote Hill Consulting LLC >Environment: FreeBSD wilee.coyotehillconsulting.com 4.4-RELEASE FreeBSD 4.4-RELEASE #0: Tue Sep 18 11:57:08 PDT 2001 murray@builder.FreeBSD.org:/usr/src/sys/compile/GENERIC i386 >Description: Character ranges in regular expression patterns are incorrect. They extend one beyond the intended end of the range. I have tested this for [0-9], [a-z], and [A-Z] (just to mention a few). I have also tested multiple ranges, such as [a-zA-Z] and the problem applies to both ranges within the same pattern. The pattern [a-zA-Z] matches latters and the characters "[" and "{". I am running nawk: PORTVERSION= 20001115 >How-To-Repeat: run the command: nawk '{printf("\"%s\" ~ \"%s\" == %s\n", $1, $2, $1 ~ $2)}' This one-liner will simply check if the first argument is matched by the second argument. Enter the following data to test for the problem: 0 [0-9] 9 [0-9] : [0-9] ; [0-9] / [0-9] You will see that "0" and "9" match (as they should). But you will also see that ":" matches as well, which it souldn't. The ";" does not match as the problem extends only "one beyond" the range. The "/" does not match indicating that there does not appear to be a problem at the beginning of the range. >Fix: Presently VERY awkward as all character ranges must be modified on the source to be one less than intended, such as [A-Y], or explicitly enumberated. I appologize in advance if the problem is due to me not running the very latest version of FreeBSD & ports or this is an already known and fixed bug. I tried to check a bug report log but I must not know how / where to check. If this is the case a pointer to such would be greatly appreciated. Thank you. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-ports" in the body of the message