From owner-freebsd-questions@freebsd.org Mon Oct 5 23:22:14 2015 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 612E79B6581 for ; Mon, 5 Oct 2015 23:22:14 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx01.qsc.de (mx01.qsc.de [213.148.129.14]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EEE6FBA4 for ; Mon, 5 Oct 2015 23:22:13 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from r56.edvax.de (port-92-195-41-64.dynamic.qsc.de [92.195.41.64]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx01.qsc.de (Postfix) with ESMTPS id 67CDE3CBDE; Tue, 6 Oct 2015 01:22:10 +0200 (CEST) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id t95NMAYG011147; Tue, 6 Oct 2015 01:22:10 +0200 (CEST) (envelope-from freebsd@edvax.de) Date: Tue, 6 Oct 2015 01:22:10 +0200 From: Polytropon To: Quartz Cc: freebsd-questions@freebsd.org Subject: Re: awk question Message-Id: <20151006012210.3c937716.freebsd@edvax.de> In-Reply-To: <56130072.9070406@sneakertech.com> References: <5611C922.4050007@hiwaay.net> <20151005042129.1f153ec6.freebsd@edvax.de> <5611F776.9090701@hiwaay.net> <56124479.9020505@sneakertech.com> <20151005165902.ad01c288.freebsd@edvax.de> <5612EF57.10207@sneakertech.com> <20151005235812.eee38247.freebsd@edvax.de> <56130072.9070406@sneakertech.com> Reply-To: Polytropon Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Oct 2015 23:22:14 -0000 On Mon, 05 Oct 2015 18:57:54 -0400, Quartz wrote: > >> It's not very much like sh or C syntax (or > >> any other syntax) and new users tend to get really confused. > > > > Hmmm... I don't know, could you provide an example where you > > would say, like, "this is not intuitive" or even "this does > > something totally strange"? > > Things I've noticed new users bump into all the time: > > > Statements must be wrapped in curly braces, ie; > > awk '{print $1}{print $2}' > I think awk is one of the few languages to do this. This is equivalent to awk '{ print $1; print $2 }', just like in C, C++, even Java and Javascript, { ... } are being used to "group statements". > Because of the above, having to type: > > awk '{print $1}' > instead of just" > > awk 'print $1' > .. in other words both the quotes and the curly braces are required. For > most other shell utilities one is enough. If you consider my previous post about what { ... } means, enlightenment will quickly follow: They define a block, and what's infront of this block states _when_ the block should be executed. This "prefix" is important. That's why it's neccessary to understand the basic "dataflow" within awk: BEGIN { ... } # before any data /pattern/ { ... } # when pattern is found (condition) { ... } # when condition is true { ... } # always! END { ... } # after all data There can be multiple pattern matching and conditional blocks, of course. The common form is awk '{ ... }' to process all input lines. For example, if you only want those which are not empty and not comments, awk '/^[^#]/ { ... }' would be used; or only lines with text over 10 characters: awk '(length > 10) { ... }', or just the 5th line: awk '(NR == 5) { ... }' > People assume that awk prints string literals like (ba)sh: > > echo "$1$2$3" > and > > awk '{print $1$2$3}' > both yield fields with nothing between them. So far so good, right? but: > > echo "$1,$2,$3" > yields results with commas between them, but: > > awk '{print $1,$2,$3}' > yields results with spaces. Yes, this is a difference, but once you know it, and especially if you want more precise control over the output, you'll quickly resort to awk's printf() function which works like in C. > OK, so it's not like sh. Maybe it's like > Javascript then? > > awk '{print $1+","+$2+","+$3}' > ... nope, now all they get is a huge list of mostly zeros, because awk > doesn't overload operators. Of course not, because that would be stupid. :-) As I said, when you want concatenation with a custom separator, use printf(): awk '{ printf("%s+%s+%s\n", $1, $2, $3); }' which provides good flexibility; like in C, you can even add formatting options for the arguments (see "man 3 printf" for comparison), like string length manipulation, numeric output format, or even the use of control characters. > > Yes, this is true, but keep in mind what awk is: a "pattern-directed > > scanning and processing language". If you want higher precision > > math, use system(" | dc") and incorporate the result; > > awk isn't really for math, but integer math is usually fine. :-) > > Right, but it's just something that makes people shy away from awk, for > better or worse. Reading "man awk" gives you a quite good introduction on what awk is and what it can do, and of course what it cannot do (or at least where it's bad at). Choosing the right tool for the job is _key_ to writing good code. As awk is not a "one size fits all" kind of tool, if you need to process numbers with high precision, it's a bad tool. And when few simple calls to grep, cut, sed, tr etc. will do the job similarly well, those could be considered instead. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...