From owner-freebsd-hackers@FreeBSD.ORG  Tue Jun  3 16:04:42 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EA394CA1
 for <freebsd-hackers@freebsd.org>; Tue,  3 Jun 2014 16:04:42 +0000 (UTC)
Received: from mx1.mail.bg (mx1.mail.bg [IPv6:2001:67c:16b8:1::2:17])
 by mx1.freebsd.org (Postfix) with ESMTP id 853DB2B26
 for <freebsd-hackers@freebsd.org>; Tue,  3 Jun 2014 16:04:42 +0000 (UTC)
Received: from [10.1.1.159] (unknown [95.87.254.225])
 (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.mail.bg (Postfix) with ESMTPSA id 4445C6000A2D;
 Tue,  3 Jun 2014 19:04:40 +0300 (EEST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=mail.bg; s=default;
 t=1401811480; bh=ypZnzhyBRlavfdj9jBKilpnY31hN98TLl5TGBBmo+QM=;
 h=Content-Type:Mime-Version:Subject:From:Date:Cc:
 Content-Transfer-Encoding:Message-Id:References:To;
 b=ZQbHlohyRd1wULQ0JoF7mWVKiP+XlW5aMn6YYAH7t8OP+FwmwFeGUZdrRJmO47ClD
 24aVj89ii5MElYDhD6HrUEfLAgbFHH1V7/PfmI1M6+kw/nlRXozc0g6l4zCjLyByme
 NJw95pdSJumDLMzgtCWSlrPeRjx4qR/mV3SRgKS8=
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1283)
Subject: Re: [GSoC] Machine readable output from userland utilities
From: Zaro Korchev <zkorchev@mail.bg>
Date: Tue, 3 Jun 2014 19:04:33 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <4C29B30D-6833-4F0E-B071-C8EA215C0A17@mail.bg>
References: <0FCB749A-67F7-4C2F-AAC1-32D0BD67B502@theravensnest.org>
To: Eitan Adler <lists@eitanadler.com>, Alfred Perlstein <bright@mu.org>,
 David Chisnall <theraven@theravensnest.org>,
 Jonathan Anderson <jonathan.robert.anderson@gmail.com>
X-Mailer: Apple Mail (2.1283)
X-Mailman-Approved-At: Tue, 03 Jun 2014 16:15:23 +0000
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jun 2014 16:04:43 -0000

Hi everybody

I see there are several different ideas about how the output format =
should be specified.

I first started using an option named -O with the idea that this can be =
changed when the best variant is decided.


There is the idea with the environment variable that we discussed with =
Eitan:

On 29 May 2014 at 18:31, Eitan Adler wrote :
> On 29 May 2014 05:12, Zaro Korchev <zkorchev@mail.bg> wrote:
>> I thought about whether it is better to use an option or environment =
variable. I did it with option because it is easier to switch an option =
on/off. It appears that the flag -O is free in almost all tools. I have =
no problem making the use an environment variable.
>=20
> My concern is that future standards may require this option (or at
> least, would be precluded from using it).  In addition, it may
> conflict with non-base utilities, such as coreutils ones.

----

There is the pipelining idea of Jonathan:

On 23 May 2014 at 16:27, Jonathan Anderson wrote :
> Imagine:
>=20
> $ ifconfig | filterBy "ether" " 3c:07:.*" | sortBy "ether" | output =
my_ifconfig.format   # or "json" or "xml" or ...
>=20
> A pipeline of little tools, each doing one thing well: how much more =
unix can you get? Currently, every command-line tool has to do two or =
three things:
> 1. its primary job,
> 2. output some arbitrary text format (that you're never allowed to =
change because other tools scrape it) and
> 3. (optionally) parse arbitrary text formats generated by users or =
some other tool.
>=20
> Task 2 is annoying: in order to usefully query command-line tools, I =
have to write a parser. The tool has binary data, I want binary data, =
but we have to go through a dump/parse dance in order for me to get the =
data. This is the approach (again, from Plan 9) that brings you Linux =
sysfs. Perhaps David would now like to comment on his cross-platform =
"how much battery do I have" experience. :)
>=20
> Task 3 isn't just annoying, however, it's risky. If every tool =
implements its own string protocol parsing, we greatly increase the risk =
of unnoticed bugs. Better to centralize as much string parsing as =
possible into a single library, which can be rigorously analyzed (and =
optimized!).
>=20
> Imagine if geom didn't have to speak XML natively, but rather used a =
supported-everywhere-in-base data structure that users could convert =
into XML if they need it. Desktop applications are going to start =
requiring structured data passing via kdbus-like interfaces (currently =
based on GLib's GVariant), so we might as well have a structured =
representation that we like and are able to provide ABI support for =
(and, in the kdbus case, can possibly be converted to/from GVariant as =
required).

----

There is the long option idea of David:

> =46rom : David Chisnall <theraven@theravensnest.org>
> Subject : R=E9p : [Machine readable output from userland utilities] =
report
> Date : 2 June 2014 16:31:11
> To : Zaro Korchev <zkorchev@mail.bg>
> Cc : soc-status@freebsd.org
>=20
> On 2 Jun 2014, at 12:43, Zaro Korchev <zkorchev@mail.bg> wrote:
>=20
>> At the moment both ls and vmstat are told to output JSON by =
specifying the -O option. However as I discussed with my mentor, this =
will be changed. The idea is to use an environment variable instead of =
the -O flag.
>=20
> I don't like the idea of using an environment variable, because this =
is something that you might want to control on a per-command basis =
within a pipeline.  Especially with respect to incremental adoption, if =
you have some commands that will emit their default format, which is =
sent to sed / awk whatever, and some that will emit json natively, you =
don't want to suddenly have the output format from the legacy tools =
change once they gain machine-readable output support.
>=20
> One *very* important thing to do is standardise the command-line flag =
that is used to specify the output format.  This may involve also =
converting some of the tools to use getopt_long if they don't already =
(lots of tools already use most single-digit options, so there's no =
possibility to define a single-letter flag that will be useable on all =
tools). =20
>=20
>> I understand your concerns about multi-threading. The idea is to have =
functions that serialize the object in an allocated buffer as it is =
constructed. Here is a more detailed example of what I mean:
>=20
> It would be better to has some stream output API as the default.  If =
one back end only supports writing to buffers, then you can add an extra =
alloc / write / free sequence to hide it, but it would be good if the =
interface understands writing directly to file descriptors.  If the back =
end natively supports streaming, then you don't need to buffer the =
output.


As you have more experience I believe you can decide which is the best.

I like the pipelining and the long option idea the most. At the moment =
I'm working on porting more tools to use libsol so this decision is not =
urgent. I can change how the format is specified easily.


Zaro