Date: Sun, 06 Jun 2010 17:06:07 -0700 From: Bakul Shah <bakul@bitblocks.com> To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no> Cc: freebsd-hackers@freebsd.org, Doug Barton <dougb@FreeBSD.org>, Rob Warnock <rpw3@rpw3.org> Subject: Re: head behaviour Message-ID: <20100607000607.97C6F5B5A@mail.bitblocks.com> In-Reply-To: Your message of "Mon, 07 Jun 2010 00:13:28 %2B0200." <86d3w3yflj.fsf@ds4.des.no> References: <20100605201242.C79345B52@mail.bitblocks.com> <4C0AB448.2040104@FreeBSD.org> <86r5kk6xju.fsf@ds4.des.no> <4C0C1A0B.4090409@FreeBSD.org> <86d3w3yflj.fsf@ds4.des.no>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 07 Jun 2010 00:13:28 +0200 =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no> wrote: > > The reason why head(1) doesn't work as expected is that it uses buffered > I/O with a fairly large buffer, so it consumes more than it needs. The > only way to make it behave as the OP expected is to use unbuffered I/O > and never read more bytes than the number of lines left, since the worst > case is input consisting entirely of empty lines. We could add an > option to do just that, but the same effect can be achieved more > portably with read(1) loops: Except read doesn't do it quite right: $ ps | (read a; echo $a ; grep zsh) PID TT STAT TIME COMMAND 1196 p0 Is 0:02.23 -zsh (zsh) 1209 p1 Is 0:00.35 -zsh (zsh) Alignment of column titles is messed up. Using egrep we can get the right alignment but egrep also shows up. $ ps | egrep 'TIME|zsh' PID TT STAT TIME COMMAND 1196 p0 Is 0:02.23 -zsh (zsh) 1209 p1 Is 0:00.35 -zsh (zsh) 71945 p2 DL+ 0:00.01 egrep TIME|zsh A small point but it is not trivial to get it exactly right. head -n directly expresses what one wants. But there is a deeper point. Several people pointed out alternatives for the examples given but in general you can't use a single command to replace a sequence of commands where each operates on part of the shared input in a different way. The reason we can't do this is buffering for efficiency. Usually there is no further use for the buffered but unconsumed input & it can be safely thrown away. So this is almost always the right thing to do but not when there *is* further use for the unconsumed input. Some programs already do the right thing (dd, for instance, as you pointed out). Some other commands do give you this option in a limited way. "man grep" & you will find: -m NUM, --max-count=NUM Stop reading a file after NUM matching lines. If the input is standard input from a regular file, and NUM matching lines are >>>> output, grep ensures that the standard input is positioned to >>>> just after the last matching line before exiting, regardless of the presence of trailing context lines. This enables a calling process to resume a search. So for instance $ < /usr/share/dict/words (grep -m 1 ''; grep -m 1 '') A a But pipe the file in and see what you get: $ cat /usr/share/dict/words | (grep -m 1 ''; grep -m 1 '') A nterectasia Grep does the right thing for files but not pipes! Now I do understand *why* this happens but still, it is annoying. So I believe there is value in providing an option to read *as much as needed* but not more. It will be slower but will handle the cases we are discussing. This will enhance *composability* -- supposedly part of the unix philosophy. The slow-but-read-just-as-much-as-needed option to be used when you need certain kind of composability and there is no other way. And yes, now do I think this is useful not just for head but also any other program that quits before reading to the end! [cc'ed Rob in case he wishes to chime in]
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100607000607.97C6F5B5A>