Date: Tue, 3 Jun 2014 19:04:33 +0300 From: Zaro Korchev <zkorchev@mail.bg> To: Eitan Adler <lists@eitanadler.com>, Alfred Perlstein <bright@mu.org>, David Chisnall <theraven@theravensnest.org>, Jonathan Anderson <jonathan.robert.anderson@gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: [GSoC] Machine readable output from userland utilities Message-ID: <4C29B30D-6833-4F0E-B071-C8EA215C0A17@mail.bg> References: <0FCB749A-67F7-4C2F-AAC1-32D0BD67B502@theravensnest.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi everybody I see there are several different ideas about how the output format = should be specified. I first started using an option named -O with the idea that this can be = changed when the best variant is decided. There is the idea with the environment variable that we discussed with = Eitan: On 29 May 2014 at 18:31, Eitan Adler wrote : > On 29 May 2014 05:12, Zaro Korchev <zkorchev@mail.bg> wrote: >> I thought about whether it is better to use an option or environment = variable. I did it with option because it is easier to switch an option = on/off. It appears that the flag -O is free in almost all tools. I have = no problem making the use an environment variable. >=20 > My concern is that future standards may require this option (or at > least, would be precluded from using it). In addition, it may > conflict with non-base utilities, such as coreutils ones. ---- There is the pipelining idea of Jonathan: On 23 May 2014 at 16:27, Jonathan Anderson wrote : > Imagine: >=20 > $ ifconfig | filterBy "ether" " 3c:07:.*" | sortBy "ether" | output = my_ifconfig.format # or "json" or "xml" or ... >=20 > A pipeline of little tools, each doing one thing well: how much more = unix can you get? Currently, every command-line tool has to do two or = three things: > 1. its primary job, > 2. output some arbitrary text format (that you're never allowed to = change because other tools scrape it) and > 3. (optionally) parse arbitrary text formats generated by users or = some other tool. >=20 > Task 2 is annoying: in order to usefully query command-line tools, I = have to write a parser. The tool has binary data, I want binary data, = but we have to go through a dump/parse dance in order for me to get the = data. This is the approach (again, from Plan 9) that brings you Linux = sysfs. Perhaps David would now like to comment on his cross-platform = "how much battery do I have" experience. :) >=20 > Task 3 isn't just annoying, however, it's risky. If every tool = implements its own string protocol parsing, we greatly increase the risk = of unnoticed bugs. Better to centralize as much string parsing as = possible into a single library, which can be rigorously analyzed (and = optimized!). >=20 > Imagine if geom didn't have to speak XML natively, but rather used a = supported-everywhere-in-base data structure that users could convert = into XML if they need it. Desktop applications are going to start = requiring structured data passing via kdbus-like interfaces (currently = based on GLib's GVariant), so we might as well have a structured = representation that we like and are able to provide ABI support for = (and, in the kdbus case, can possibly be converted to/from GVariant as = required). ---- There is the long option idea of David: > =46rom : David Chisnall <theraven@theravensnest.org> > Subject : R=E9p : [Machine readable output from userland utilities] = report > Date : 2 June 2014 16:31:11 > To : Zaro Korchev <zkorchev@mail.bg> > Cc : soc-status@freebsd.org >=20 > On 2 Jun 2014, at 12:43, Zaro Korchev <zkorchev@mail.bg> wrote: >=20 >> At the moment both ls and vmstat are told to output JSON by = specifying the -O option. However as I discussed with my mentor, this = will be changed. The idea is to use an environment variable instead of = the -O flag. >=20 > I don't like the idea of using an environment variable, because this = is something that you might want to control on a per-command basis = within a pipeline. Especially with respect to incremental adoption, if = you have some commands that will emit their default format, which is = sent to sed / awk whatever, and some that will emit json natively, you = don't want to suddenly have the output format from the legacy tools = change once they gain machine-readable output support. >=20 > One *very* important thing to do is standardise the command-line flag = that is used to specify the output format. This may involve also = converting some of the tools to use getopt_long if they don't already = (lots of tools already use most single-digit options, so there's no = possibility to define a single-letter flag that will be useable on all = tools). =20 >=20 >> I understand your concerns about multi-threading. The idea is to have = functions that serialize the object in an allocated buffer as it is = constructed. Here is a more detailed example of what I mean: >=20 > It would be better to has some stream output API as the default. If = one back end only supports writing to buffers, then you can add an extra = alloc / write / free sequence to hide it, but it would be good if the = interface understands writing directly to file descriptors. If the back = end natively supports streaming, then you don't need to buffer the = output. As you have more experience I believe you can decide which is the best. I like the pipelining and the long option idea the most. At the moment = I'm working on porting more tools to use libsol so this decision is not = urgent. I can change how the format is specified easily. Zaro
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C29B30D-6833-4F0E-B071-C8EA215C0A17>