From owner-freebsd-questions@FreeBSD.ORG Thu Aug 2 14:04:50 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3EB4C106564A for ; Thu, 2 Aug 2012 14:04:50 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id B439F8FC08 for ; Thu, 2 Aug 2012 14:04:49 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.5/8.14.5) with ESMTP id q72E4l7c081059; Thu, 2 Aug 2012 08:04:47 -0600 (MDT) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.5/8.14.5/Submit) with ESMTP id q72E4lkc081056; Thu, 2 Aug 2012 08:04:47 -0600 (MDT) (envelope-from wblock@wonkity.com) Date: Thu, 2 Aug 2012 08:04:47 -0600 (MDT) From: Warren Block To: RW In-Reply-To: <20120802141738.62ef1e45@gumby.homeunix.com> Message-ID: References: <743721353.9443.1343906452119.JavaMail.sas1@172.29.249.242> <20120802141738.62ef1e45@gumby.homeunix.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (wonkity.com [127.0.0.1]); Thu, 02 Aug 2012 08:04:47 -0600 (MDT) Cc: freebsd-questions@freebsd.org Subject: Re: buggy awk regex handling? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Aug 2012 14:04:50 -0000 On Thu, 2 Aug 2012, RW wrote: > On Thu, 02 Aug 2012 13:20:52 +0200 > kaltheat wrote: > >> I tried to replace three letters with three letters by awk using the >> sub-routine. I assumed that my regular expression does mean the >> following: >> >> match if three letters of any letter of alphabet occurs anywhere in >> input >> >> $ echo AbC | awk '{sub(/[[:alpha:]]{3}/,"cBa"); print;}' >> AbC >> >> As you can see the result was unexpected. >> When I try doing it for at least one letter, it works: >> >> $ echo AbC | awk '{sub(/[[:alpha:]]+/,"cBa"); print;}' >> cBa >> ... >> What am I doing wrong? >> Or is awk buggy? > > Traditional awk implementations don't support {n}, but I think POSIX > implementations should. Using gawk instead of awk agrees with that. Printing the result of the sub (the number of substitutions performed) makes it a little more clear: % echo AbC | awk '{print sub(/[[:alpha:]]{3}/,"cBa"); print;}' 0 AbC % echo AbC | gawk '{print sub(/[[:alpha:]]{3}/,"cBa"); print;}' 1 cBa sed can handle it: % echo AbC | sed -E 's/[[:alpha:]]{3}/cBa/' cBa