Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Feb 2001 13:48:49 -0800
From:      "Jonathan Graehl" <jonathan@graehl.org>
To:        "freebsd-Arch" <freebsd-arch@FreeBSD.ORG>
Subject:   Unicode, command line options, and configuration files, oh my!
Message-ID:  <NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan@graehl.org>

next in thread | raw e-mail | index | archive | help
How much change would be needed to have a Unicode-capable FreeBSD system?
Supposing the variable-length encoding is used, all existing text output,
filenames, and string-based kernel interfaces should be compliant (although not
capable of understanding multiple-byte-char input/output); would command line
options be passed as byte-strings by a Unicode-capable shell?

There doesn't seem to be any impetus to systematically adopt Unicode (especially
the fixed-two-bytes-per-char variant, which for most cases would simply double
the storage/bandwidth requirement), although there are user-applications which
operate on multibyte text.  I am sure that by now admins and programmers in
country XYZ are used to working with ASCII and pseudo-English (no matter how
inconvenient it might be to generate from their keyboards).

Presently, every program that has a configuration file has its own ad-hoc syntax
(and code for twiddling it).  People have often suggested using XML documents
for configuration files (and some have even gone on to do so).  Presumably the
DTD would be inline or located according to some standard, so that a generic
structured-editing-tool could be used to allow
viewing/modification/syntax-checking without having to roll your own utilities.
XML documents are allowed to use a number of character encodings (including
variable and two-byte Unicode).  XML parsers are widely available, although the
truly compliant ones are several megabytes of code.  The parser can either build
you a tree-structure of the document, which you traverse, or, more sensibly, can
traverse the tree implicitly for you as it parses, using callbacks you supply.
You can also get text as ISO-whatever single-byte or multibyte.  A subset of the
XML standard would definitely suffice to fulfill existing needs, and it would be
nice to have every program that has an XML configuration file using a shared
library, which would ease any eventually addition of Unicode capability.
However, there is a one-time cost (which I have not yet paid) in learning how to
use XML instead of whatever you're used to (lex/yacc, or ad-hoc code), which is
responsible for the reign of the status quo.  I'm sure we all agree that there
is no need to use XML, or even a subset of it, except to capitalize on existing
parsers and editors.  What is really needed is an easy-to-specify metadata
format which can allow for structured editing of documents, and generation of
parsers and syntax descriptions.

Parsing of command line options (and positional parameters) is also largely
ad-hoc.  Looking through /usr/src, I see that for the most case, it consists of
a getopt loop with hand-coded cases, a hand-written usage string, and a
hand-written man-page-usage.  Much like the XML DTD, it would make sense to
generically specify (to the extent possible, and with user-defined code to the
extent not) the syntax and semantics, and generate variable definitions,
parsing/checking code, usage(), man page synopsis ...  While it would be
possible to have an expressive grammar for command line options, typically
the -opts are order-independent, and there are only a few positional parameters
(or else you put the mess into a configuration file).  There are a variety of
packages out there, which I am seeking opinions on, not having tried any of
them:

autoopts (part of autogen, sort of m4+scheme?, everything and the kitchen sink,
environment variables/simple-config-files as well as command-line-options) at
http://autogen.sourceforge.net/autoopts/

gengetopt (GNU, uses getopt_long - which we don't have in FreeBSD because
getopt_long isn't LGPL?  (Not that I think getopt_long is that brilliant, nor
had I even noticed the lack of --recursive in my cp command ;) at
http://www.gnu.org/software/gengetopt/gengetopt.html and ports/devel/gengetopt
(generated code, including getopt_long, isn't restricted by any license?)

genparse
at http://genparse.sourceforge.net/

mkcmd (does manpage/usage/options)
at ports/devel/mkcmd

any others?

ifconfig seemed to have one of the more enlightened-looking option parsers (an
array of parameter information processed in a loop, rather than a bunch of
hard-coded cases) out of several FreeBSD programs I examined ... are there any
other good examples?

It's also amusing to see how many different ways various servers in the tree can
open a configuration file (path read from command line), write a pid file (path
read from command line), daemonize, read an IP address/hostname and port (read
from command line) and listen there, mask nonfatal signals, relinquish
priveleges - although I appreciate that different servers want to do things
slightly differently.  Naturally, each of us is easily able to reuse our own
code (preferably by libraries/macros/#include rather than copy/paste), but I
think that there is a lot of common configuration/command-line code that could
be coalesced behind a good-enough-extensible interface that we could reuse code
on a larger scale.

--
Jonathan Graehl
  email: jonathan@graehl.org
  web: http://jonathan.graehl.org/
  phone: 858-642-7562


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan>