From owner-freebsd-bugs@FreeBSD.ORG Sat Jan 30 11:30:01 2010 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 336B81065693 for ; Sat, 30 Jan 2010 11:30:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 105978FC1F for ; Sat, 30 Jan 2010 11:30:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id o0UBU0qJ089958 for ; Sat, 30 Jan 2010 11:30:00 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id o0UBU0nk089951; Sat, 30 Jan 2010 11:30:00 GMT (envelope-from gnats) Resent-Date: Sat, 30 Jan 2010 11:30:00 GMT Resent-Message-Id: <201001301130.o0UBU0nk089951@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Mikolaj Golub Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8DD2C106576C for ; Sat, 30 Jan 2010 11:20:27 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 7C6DB8FC13 for ; Sat, 30 Jan 2010 11:20:27 +0000 (UTC) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o0UBKRjV017750 for ; Sat, 30 Jan 2010 11:20:27 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id o0UBKR9J017749; Sat, 30 Jan 2010 11:20:27 GMT (envelope-from nobody) Message-Id: <201001301120.o0UBKR9J017749@www.freebsd.org> Date: Sat, 30 Jan 2010 11:20:27 GMT From: Mikolaj Golub To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: bin/143369: awk(1) doesn't handle RS as a regexp but as a single character X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2010 11:30:01 -0000 >Number: 143369 >Category: bin >Synopsis: awk(1) doesn't handle RS as a regexp but as a single character >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Jan 30 11:30:00 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Mikolaj Golub >Release: 8.0-STABLE, 7.2-STABLE >Organization: >Environment: FreeBSD zhuzha.ua1 8.0-STABLE FreeBSD 8.0-STABLE #6: Sun Jan 24 21:36:17 EET 2010 root@zhuzha.ua1:/usr/obj/usr/src/sys/GENERIC i386 >Description: This problem with awk(1) was reported to NetBSD by John Darrow and it was fixed there. awk allows a complete string to be put into the RS variable, but does not treat that string as a regular expression for record splitting purposes - instead, it splits only on the first character of the string. http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=30294 FreeBSD has the same problem and it would be nice to fix this. >How-To-Repeat: zhuzha:~% echo 'a b c d' | awk 'BEGIN {RS=" ";} {print $0}' a b c d zhuzha:~% echo 'a b c d' | awk 'BEGIN {RS="[[:space:]]";} {print $0}' a b c d zhuzha:~% echo 'a[b[c[d' | awk 'BEGIN {RS="[[:space:]]";} {print $0}' a b c d >Fix: See the attached patch adopted from NetBSD (PR/30294: John Darrow: nawk doesn't handle RS as a RE but as a single character). Patch attached with submission follows: diff -ru contrib/one-true-awk.orig/lib.c contrib/one-true-awk/lib.c --- contrib/one-true-awk.orig/lib.c 2007-10-25 15:38:02.000000000 +0300 +++ contrib/one-true-awk/lib.c 2010-01-30 13:04:13.000000000 +0200 @@ -194,22 +194,62 @@ ; if (c != EOF) ungetc(c, inf); - } - for (rr = buf; ; ) { - for (; (c=getc(inf)) != sep && c != EOF; ) { - if (rr-buf+1 > bufsize) - if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 1")) - FATAL("input record `%.30s...' too long", buf); + } else if ((*RS)[1]) { + fa *pfa = makedfa(*RS, 1); + int tempstat = pfa->initstat; + char *brr = buf; + char *rrr = NULL; + int x; + for (rr = buf; ; ) { + while ((c = getc(inf)) != EOF) { + if (rr-buf+3 > bufsize) + if (!adjbuf(&buf, &bufsize, 3+rr-buf, + recsize, &rr, "readrec 2")) + FATAL("input record `%.30s...'" + " too long", buf); + *rr++ = c; + *rr = '\0'; + if (!(x = nematch(pfa, brr))) { + pfa->initstat = tempstat; + if (rrr) { + rr = rrr; + ungetc(c, inf); + break; + } + } else { + pfa->initstat = 2; + brr = rrr = rr = patbeg; + } + } + if (rrr || c == EOF) + break; + if ((c = getc(inf)) == '\n' || c == EOF) + /* 2 in a row */ + break; + *rr++ = '\n'; + *rr++ = c; + } + } else { + for (rr = buf; ; ) { + for (; (c=getc(inf)) != sep && c != EOF; ) { + if (rr-buf+1 > bufsize) + if (!adjbuf(&buf, &bufsize, 1+rr-buf, + recsize, &rr, "readrec 1")) + FATAL("input record `%.30s...'" + " too long", buf); + *rr++ = c; + } + if (**RS == sep || c == EOF) + break; + if ((c = getc(inf)) == '\n' || c == EOF) + /* 2 in a row */ + break; + if (!adjbuf(&buf, &bufsize, 2+rr-buf, recsize, &rr, + "readrec 2")) + FATAL("input record `%.30s...' too long", buf); + *rr++ = '\n'; *rr++ = c; } - if (**RS == sep || c == EOF) - break; - if ((c = getc(inf)) == '\n' || c == EOF) /* 2 in a row */ - break; - if (!adjbuf(&buf, &bufsize, 2+rr-buf, recsize, &rr, "readrec 2")) - FATAL("input record `%.30s...' too long", buf); - *rr++ = '\n'; - *rr++ = c; } if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 3")) FATAL("input record `%.30s...' too long", buf); >Release-Note: >Audit-Trail: >Unformatted: