Date: Wed, 28 Feb 2001 13:48:49 -0800 From: "Jonathan Graehl" <jonathan@graehl.org> To: "freebsd-Arch" <freebsd-arch@FreeBSD.ORG> Subject: Unicode, command line options, and configuration files, oh my! Message-ID: <NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan@graehl.org>
next in thread | raw e-mail | index | archive | help
How much change would be needed to have a Unicode-capable FreeBSD system? Supposing the variable-length encoding is used, all existing text output, filenames, and string-based kernel interfaces should be compliant (although not capable of understanding multiple-byte-char input/output); would command line options be passed as byte-strings by a Unicode-capable shell? There doesn't seem to be any impetus to systematically adopt Unicode (especially the fixed-two-bytes-per-char variant, which for most cases would simply double the storage/bandwidth requirement), although there are user-applications which operate on multibyte text. I am sure that by now admins and programmers in country XYZ are used to working with ASCII and pseudo-English (no matter how inconvenient it might be to generate from their keyboards). Presently, every program that has a configuration file has its own ad-hoc syntax (and code for twiddling it). People have often suggested using XML documents for configuration files (and some have even gone on to do so). Presumably the DTD would be inline or located according to some standard, so that a generic structured-editing-tool could be used to allow viewing/modification/syntax-checking without having to roll your own utilities. XML documents are allowed to use a number of character encodings (including variable and two-byte Unicode). XML parsers are widely available, although the truly compliant ones are several megabytes of code. The parser can either build you a tree-structure of the document, which you traverse, or, more sensibly, can traverse the tree implicitly for you as it parses, using callbacks you supply. You can also get text as ISO-whatever single-byte or multibyte. A subset of the XML standard would definitely suffice to fulfill existing needs, and it would be nice to have every program that has an XML configuration file using a shared library, which would ease any eventually addition of Unicode capability. However, there is a one-time cost (which I have not yet paid) in learning how to use XML instead of whatever you're used to (lex/yacc, or ad-hoc code), which is responsible for the reign of the status quo. I'm sure we all agree that there is no need to use XML, or even a subset of it, except to capitalize on existing parsers and editors. What is really needed is an easy-to-specify metadata format which can allow for structured editing of documents, and generation of parsers and syntax descriptions. Parsing of command line options (and positional parameters) is also largely ad-hoc. Looking through /usr/src, I see that for the most case, it consists of a getopt loop with hand-coded cases, a hand-written usage string, and a hand-written man-page-usage. Much like the XML DTD, it would make sense to generically specify (to the extent possible, and with user-defined code to the extent not) the syntax and semantics, and generate variable definitions, parsing/checking code, usage(), man page synopsis ... While it would be possible to have an expressive grammar for command line options, typically the -opts are order-independent, and there are only a few positional parameters (or else you put the mess into a configuration file). There are a variety of packages out there, which I am seeking opinions on, not having tried any of them: autoopts (part of autogen, sort of m4+scheme?, everything and the kitchen sink, environment variables/simple-config-files as well as command-line-options) at http://autogen.sourceforge.net/autoopts/ gengetopt (GNU, uses getopt_long - which we don't have in FreeBSD because getopt_long isn't LGPL? (Not that I think getopt_long is that brilliant, nor had I even noticed the lack of --recursive in my cp command ;) at http://www.gnu.org/software/gengetopt/gengetopt.html and ports/devel/gengetopt (generated code, including getopt_long, isn't restricted by any license?) genparse at http://genparse.sourceforge.net/ mkcmd (does manpage/usage/options) at ports/devel/mkcmd any others? ifconfig seemed to have one of the more enlightened-looking option parsers (an array of parameter information processed in a loop, rather than a bunch of hard-coded cases) out of several FreeBSD programs I examined ... are there any other good examples? It's also amusing to see how many different ways various servers in the tree can open a configuration file (path read from command line), write a pid file (path read from command line), daemonize, read an IP address/hostname and port (read from command line) and listen there, mask nonfatal signals, relinquish priveleges - although I appreciate that different servers want to do things slightly differently. Naturally, each of us is easily able to reuse our own code (preferably by libraries/macros/#include rather than copy/paste), but I think that there is a lot of common configuration/command-line code that could be coalesced behind a good-enough-extensible interface that we could reuse code on a larger scale. -- Jonathan Graehl email: jonathan@graehl.org web: http://jonathan.graehl.org/ phone: 858-642-7562 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan>