Date: Sat, 30 Jan 2010 11:20:27 GMT From: Mikolaj Golub <to.my.trociny@gmail.com> To: freebsd-gnats-submit@FreeBSD.org Subject: bin/143369: awk(1) doesn't handle RS as a regexp but as a single character Message-ID: <201001301120.o0UBKR9J017749@www.freebsd.org> Resent-Message-ID: <201001301130.o0UBU0nk089951@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 143369 >Category: bin >Synopsis: awk(1) doesn't handle RS as a regexp but as a single character >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Jan 30 11:30:00 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Mikolaj Golub >Release: 8.0-STABLE, 7.2-STABLE >Organization: >Environment: FreeBSD zhuzha.ua1 8.0-STABLE FreeBSD 8.0-STABLE #6: Sun Jan 24 21:36:17 EET 2010 root@zhuzha.ua1:/usr/obj/usr/src/sys/GENERIC i386 >Description: This problem with awk(1) was reported to NetBSD by John Darrow and it was fixed there. awk allows a complete string to be put into the RS variable, but does not treat that string as a regular expression for record splitting purposes - instead, it splits only on the first character of the string. http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=30294 FreeBSD has the same problem and it would be nice to fix this. >How-To-Repeat: zhuzha:~% echo 'a b c d' | awk 'BEGIN {RS=" ";} {print $0}' a b c d zhuzha:~% echo 'a b c d' | awk 'BEGIN {RS="[[:space:]]";} {print $0}' a b c d zhuzha:~% echo 'a[b[c[d' | awk 'BEGIN {RS="[[:space:]]";} {print $0}' a b c d >Fix: See the attached patch adopted from NetBSD (PR/30294: John Darrow: nawk doesn't handle RS as a RE but as a single character). Patch attached with submission follows: diff -ru contrib/one-true-awk.orig/lib.c contrib/one-true-awk/lib.c --- contrib/one-true-awk.orig/lib.c 2007-10-25 15:38:02.000000000 +0300 +++ contrib/one-true-awk/lib.c 2010-01-30 13:04:13.000000000 +0200 @@ -194,22 +194,62 @@ ; if (c != EOF) ungetc(c, inf); - } - for (rr = buf; ; ) { - for (; (c=getc(inf)) != sep && c != EOF; ) { - if (rr-buf+1 > bufsize) - if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 1")) - FATAL("input record `%.30s...' too long", buf); + } else if ((*RS)[1]) { + fa *pfa = makedfa(*RS, 1); + int tempstat = pfa->initstat; + char *brr = buf; + char *rrr = NULL; + int x; + for (rr = buf; ; ) { + while ((c = getc(inf)) != EOF) { + if (rr-buf+3 > bufsize) + if (!adjbuf(&buf, &bufsize, 3+rr-buf, + recsize, &rr, "readrec 2")) + FATAL("input record `%.30s...'" + " too long", buf); + *rr++ = c; + *rr = '\0'; + if (!(x = nematch(pfa, brr))) { + pfa->initstat = tempstat; + if (rrr) { + rr = rrr; + ungetc(c, inf); + break; + } + } else { + pfa->initstat = 2; + brr = rrr = rr = patbeg; + } + } + if (rrr || c == EOF) + break; + if ((c = getc(inf)) == '\n' || c == EOF) + /* 2 in a row */ + break; + *rr++ = '\n'; + *rr++ = c; + } + } else { + for (rr = buf; ; ) { + for (; (c=getc(inf)) != sep && c != EOF; ) { + if (rr-buf+1 > bufsize) + if (!adjbuf(&buf, &bufsize, 1+rr-buf, + recsize, &rr, "readrec 1")) + FATAL("input record `%.30s...'" + " too long", buf); + *rr++ = c; + } + if (**RS == sep || c == EOF) + break; + if ((c = getc(inf)) == '\n' || c == EOF) + /* 2 in a row */ + break; + if (!adjbuf(&buf, &bufsize, 2+rr-buf, recsize, &rr, + "readrec 2")) + FATAL("input record `%.30s...' too long", buf); + *rr++ = '\n'; *rr++ = c; } - if (**RS == sep || c == EOF) - break; - if ((c = getc(inf)) == '\n' || c == EOF) /* 2 in a row */ - break; - if (!adjbuf(&buf, &bufsize, 2+rr-buf, recsize, &rr, "readrec 2")) - FATAL("input record `%.30s...' too long", buf); - *rr++ = '\n'; - *rr++ = c; } if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 3")) FATAL("input record `%.30s...' too long", buf); >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201001301120.o0UBKR9J017749>