From owner-svn-src-head@freebsd.org  Wed Aug  9 08:44:23 2017
Return-Path: <owner-svn-src-head@freebsd.org>
Delivered-To: svn-src-head@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 24966DC6D7D;
 Wed,  9 Aug 2017 08:44:23 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id C7FB36858C;
 Wed,  9 Aug 2017 08:44:22 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 6A3EB3CCF6C;
 Wed,  9 Aug 2017 18:44:19 +1000 (AEST)
Date: Wed, 9 Aug 2017 18:44:19 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Emmanuel Vadot <manu@bidouilliste.com>
cc: Bruce Evans <brde@optusnet.com.au>, Emmanuel Vadot <manu@freebsd.org>, 
 src-committers@freebsd.org, svn-src-all@freebsd.org, 
 svn-src-head@freebsd.org
Subject: Re: svn commit: r322252 - head/usr.bin/vmstat
In-Reply-To: <20170808161507.2d185d394dc921eac6729d89@bidouilliste.com>
Message-ID: <20170809171526.B1567@besplex.bde.org>
References: <201708081218.v78CIBvL068413@repo.freebsd.org>
 <20170808225104.I3528@besplex.bde.org>
 <20170808161507.2d185d394dc921eac6729d89@bidouilliste.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=LI0WeNe9 c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=Q9sN3FMeAAAA:8
 a=3W0Di7-B9xzAe2TjPFYA:9 a=4JXGKAPf0pKDMlew:21 a=AfNcCDu4C7j3jd7Y:21
 a=CjuIK1q_8ugA:10 a=mnDzqiP7QBHErLppGYsC:22
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Aug 2017 08:44:23 -0000

On Tue, 8 Aug 2017, Emmanuel Vadot wrote:

> On Tue, 8 Aug 2017 23:55:52 +1000 (EST)
> Bruce Evans <brde@optusnet.com.au> wrote:
>
>> On Tue, 8 Aug 2017, Emmanuel Vadot wrote:
>>
>>> Log:
>>>  vmstat: Always emit a space after the free-memory column
>>>
>>>  When displaying in non-human form, if the free-memory number
>>>  is large (more than 7 digits), there is no space between it and
>>>  the page fault column.
>>>
>>>  PR:		221290
>>>  Submitted by:	Josuah Demangeon <mail@josuah.net> (Original version)
>>>
>>> Modified:
>>>  head/usr.bin/vmstat/vmstat.c
>>>
>>> Modified: head/usr.bin/vmstat/vmstat.c
>>> ==============================================================================
>>> --- head/usr.bin/vmstat/vmstat.c	Tue Aug  8 11:49:36 2017	(r322251)
>>> +++ head/usr.bin/vmstat/vmstat.c	Tue Aug  8 12:18:11 2017	(r322252)
>>> @@ -832,6 +832,7 @@ dovmstat(unsigned int interval, int reps)
>>> 			xo_emit(" ");
>>> 			xo_emit("{:free-memory/%7d}",
>>> 			        vmstat_pgtok(total.t_free));
>>> +			xo_emit(" ");
>>> 		}
>>> 		xo_emit("{:total-page-faults/%5lu} ",
>>> 		        (unsigned long)rate(sum.v_vm_faults -
>>
>> This seems to break the formatting.  There was a negative amount of space
>> available for expansion, and since the header was not expanded to match
>> its alignment with the fields is more random than before.  With -h, the
>> width was 80 columns, giving ugly line wrap on 80-column terminals with
>> auto-wrap.  Now it is 81 columns, giving uglier line wrap on all 80-
>> column terminals.
>
> This break nothing, This was the case before too (with or without
> -h), just tested in tmux with force-width 80.

Testing shows that it breaks the -H case as expected.  (I had misread the
change as affecting the -h case too.)

Before:

X procs     memory        page                    disks     faults         cpu
X r b w     avm     fre  flt  re  pi  po    fr   sr ad0 ad1   in    sy    cs us sy id
X 0 0 0 12719376  977460 2167   3  28   0  2545  451   0   0  268 27162  7175  9  1 91

Misformatting in the data line starts with the avm field taking 8 of the 7
columns reserved for it.  This and all subsequent fields are misaligned with
the header.

The header is pre-misformatted to width 85, with lots of inconsistencies in
the first line.  Old versions were carefully formatted to width 79, with no
inconsistencies in the first line.  The extra width for avm makes the extra
width for the header not even work to line up.

X procs     memory        page                    disks     faults         cpu
X r b w     avm     fre  flt  re  pi  po    fr   sr ad0 ad1   in    sy    cs us sy id
X 0 0 0 12719380  977460  2167   3  28   0  2545  451   0   0  268 27162  7175  9  1 91

The new bugs are that the extra space before the "flt" field increases the
off-by-1 error to off-by-2 for this field and all subsequent fields.  All
fields starting with "flt" now have an error of at least off-by-1 even
if none are too wide, since the header was not expanded to match.  Expansion
would increase its breakage from 6 to wide to 7 too wide.

>> The bugs were mostly in the first line of the header:
>> - the second line of the header was correct for vmstat -h
>> - for vmstat without -h, the second line of the header was apparently broken
>>    by a change like the one here, that added a space after the "r b w" fields
>>    without adding one in the "r b w" header
>> - most of the fields in the first line of the header are misaligned with the
>>    second lone.  Many have drifted 3 to the left of where the were in a sort
>>    of center-justified place.  Some of these might have actually been
>>    intended to be left justified, but had an off by +1 error.  Now these
>>    have an error of off by -2 relative to left justifications.
>>
>> Only the "memory" header in the first line is better than in old versions.
>> It is now left justified.  Left justifying all headers in the first line
>> is probably best.  I couldn't find a good way to delimit the right hand
>> side of the extents of the headers in the first line.  The second line of
>> the headers already uses right justification consistently and this works
>> well.
>
> I think that all this might be true but you might talk about the whole
> libxo conversion that was done, not my commit right ?

I try not to look at libxo, since its misformattings are enormous starting
in the source code.

My columnize.awk script works almost perfectly to fix the above
misformattings:

X BEGIN {
X 	columns = 79			# XXX
X }
X 
X {
X 	# Determine fields and field widths for current line.
X 	# awk's field splitting feature turned out to be inadequate,
X 	# and this would be even easier in C.
X 	n = split($0, a, "")
X 	f = 1
X 	j = 1
X 	while (j <= n) {
X 		cpw[f] = 0
X 		cfw[f] = 0
X 		while (j <= n && a[j] == " ") {
X 			cpw[f]++	# current (left) padding width
X 			j++
X 		}
X 		if (j > n)
X 			break
X 		while (j <= n && a[j] != " ") {
X 			cfw[f]++	# current field width without padding
X 			j++
X 		}
X 		fld[f] = substr($0, j - cfw[f], cfw[f])
X 		if (f != 1) {
X 			cpw[f]--
X 			cfw[f]++	# current field width with min padding
X 		}
X 		f++
X 	}
X 	nf = f - 1			# no need to use or trust NF
X 
X 	if (NR == 1 || NR < 3 && nf != anf) {
X 		# Make current field widths (with full padding) the maximum
X 		# ones.  When NR > 1, this discards the previous maximums.
X 		# Good enough for vmstat, where the widths from NR == 1 are
X 		# garbage.
X 		#
X 		# Make current field widths (with minimal padding) the minimum
X 		# ones.
X 		anf = nf
X 		for (f = 1; f <= anf; f++) {
X 			Mfw[f] = cpw[f] + cfw[f]
X 			mfw[f] = cfw[f]
X 		}
X 	} else if (nf < anf) {
X 		# Some non-tabular line after warming up.  Probably an ornate
X 		# line in the next header.  Too hard to handle properly.
X 		printf("%.*s oops\n", columns - 5, $0)
X 		fflush(stdout)
X 		next
X 	} else if (nf > anf) {
X 		# Some non-tabular line after warming up.  Hopefully just
X 		# a single trailing field with inadequate field separators
X 		# like the COMMAND field in ps and top.  Append the extras
X 		# to the last field with minimal separators.
X 		for (f = anf + 1; f <= nf; f++)
X 			fld[anf] = fld[anf] " " fld[f]
X 		nf = anf
X 		cfw[nf] = 1 + length(fld[nf])
X 	}
X 
X 	# Update the maximum and minimum field widths if this line needs
X 	# wider fields.
X 	len = 0
X 	for (f = 1; f <= nf; f++) {
X 		if (Mfw[f] < cfw[f])
X 			Mfw[f] = cfw[f]
X 		len += Mfw[f]
X 		if (mfw[f] < cfw[f])
X 			mfw[f] = cfw[f]
X 	}
X 
X 	# If the line would be too wide, reduce the maximum field widths
X 	# minimally towards minimum ones.
X 	while (len > columns) {
X 		for (f = 1; f <= nf; f++) {
X 			if (Mfw[f] > mfw[f]) {
X 				Mfw[f]--
X 				len--
X 				break
X 			}
X 		}
X 		if (f > nf)
X 			break
X 	}
X 
X 	s = ""
X 	for (f = 1; f <= nf - 1; f++)
X 		s = s sprintf("%*s", Mfw[f], fld[f])
X 	s = s sprintf(" %-*s", Mfw[nf] - 1, fld[nf])
X 	printf("%.*s\n", columns, s)
X 	fflush(stdout)
X }

When applied to the above saved output (old then new, quoted by X's), it
produces:

X procs     memory        page                    disks     faults cpu 
X r b w avm   fre  flt  re  pi  po    fr   sr ad0 ad1   in    sy    cs us sy id
X 0 0 0 12719376 977460 2167  3 28  0 2545 451   0   0 268 27162  7175  9  1 91
X procs     memory        page                    disks     faults         oops
X r b w      avm    fre  flt re pi po   fr  sr ad0 ad1  in    sy    cs us sy id
X 0 0 0 12719380 977460 2167  3 28  0 2545 451   0   0 268 27162  7175  9  1 91

It removed enough spaces to fit in 79 columns even with the X's.  It can't
handle the first line of the header very well since this line has an ornate
format whose columns don't match the rest of the output in any simple way
(programs should produce more parseable formats, like ps does).  All that
that it did for the first line was move "cpu" to a less worse place in the
old output and change it to "oops" in the new output.

Even programs like ps and df that attempt to do the necessary dynamic
formatting often produce worse results than the above script.  df on
freefall now produces a mess lots of spaces giving long lines even with
-h, I think because the mountpoint names are very long and df has hard-
coded formats for everything else.  columnize.awk only manages to almost
clean this up for the -h case.  It has a problem that it is stream-based
and doesn't backtrack to adjust early lines.  This is necessary for use
in filters.

vmstat 1 | awk -f columize.awk is a good example of using columize.awk
in a filter.  This use also requires it to understand headers not in
the first few lines of output.  This is not easy, and in practice it
gets confused by the first line of the vmstat header unless it is the
first line of the output.  This is why it printed "oops" in the
abov concatenated output.  The first header line just has an unusual
number of fields so cannot be handled right.  When it is the first
line in the output, it is printed verbatim except possibly for removing
spaces, since it gives the only available information on the expected
columnar output.

Bruce