Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Jan 2009 04:06:48 +0100
From:      =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To:        Chuck Robey <chuckr@telenix.org>
Cc:        Frank Shute <frank@shute.org.uk>, Murray Stokely <murray@stokely.org>, freebsd-chat@freebsd.org
Subject:   Re: text formatting tools.
Message-ID:  <86ocxs9liv.fsf@ds4.des.no>
In-Reply-To: <497E216D.1060903@telenix.org> (Chuck Robey's message of "Mon, 26 Jan 2009 15:47:41 -0500")
References:  <497B77C7.90001@telenix.org> <2a7894eb0901241353l56be13b4s9860b9e949bc9ec2@mail.gmail.com> <20090124224237.GA96097@melon.esperance-linux.co.uk> <2a7894eb0901241449y49391f6aj6414875e8781ea4@mail.gmail.com> <497CE231.5000202@telenix.org> <2a7894eb0901251821i6e25bfd3i4c235f946d2e581b@mail.gmail.com> <497E216D.1060903@telenix.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Chuck Robey <chuckr@telenix.org> writes:
> My comments were all directed towards the fact that xml is (and
> enforces) a hierarchical approach to writing [...]

No, it doesn't.  Nor is it "gigantic", or in any way comparable to
Cobol.

There are so many factual errors, misconceptions, and plain ignorance in
this thread (mostly from you), I don't even know where to begin.

XML is nothing more or less than a very lightweight, easy-to-parse
hierarchical markup language.  The complete specification is about 50
pages long, and you can skip the last 30 or so and still get a good
understanding of the language.

There are a number of related technologies (DOM, XSL, XPath, XLink,
XInclude, XML Schemas, Relax-NG etc), but you don't need to know any of
them unless you plan to develop your own XML-based document preparation
system - except for XInclude, which is useful for splitting up a
document into multiple files; the spec is 30 pages long, but the only
thing you need to know is that <xi:include href=3D"chapter01.xml"/> will
cause chapter01.xml to be included at that point in the document.

XML is a streamlined successor to SGML.  They look very much alike, and
they share a common subset, but there are constructs in each that don't
exist in the other; for instance, SGML has several shortcut notations
(tag minimization) which are not allowed from XML, and XML has one
(empty element) that isn't in SGML.  Removing tag minimization makes XML
more verbose, but it increases the chance of detecting errors or
corruption early.  It also simplifies canonicalization, a concept which
SGML lacks entirely.

SGML, by the way, goes back a long time; the current specification was
adopted in 1986.  The first draft of the XML specification was published
ten years later, in 1996.

XML is not in itself a document format, but there are a number of
document formats based on XML.  The most widely known is XHTML, which is
plain old HTML (which is SGML-based) with minor modifications to make it
acceptable to an XML parser.

Two other XML-based document formats you may have heard of are ODF and
OOXML.  Look them up on Wikipedia if you're interested in the details.

DocBook was created by O'Reilly in 1991 as a markup language for books,
primarily technical manuals.  For many years, O'Reilly would only accept
manuscripts in DocBook format.  Because of this, DocBook has a number of
quirks, including markup for man pages, for command line examples, and
for various user interface elements, including keystroke combinations
(often rendered as line drawings of key caps).

DocBook was originally an SGML application; an XML version was
introduced in 2001, and as of DocBook 5.0, the SGML version has been
abandoned.  However, most of the FreeBSD documentation (FAQ, handbook
and various articles) is still in DocBook SGML; converting the source
into DocBook XML should not be too hard, but converting the toolchain,
templates and stylesheets is another matter entirely.

XML is hierarchical, but that does not mean DocBook is, or has to be.
You can write highly structured documents in DocBook, but you can also
write a completely flat document, using only <para> and so-called bridge
heads (free-standing headings similar to HTML's <h1>, <h2> etc.) within
a top-level element (<article>, <book> or whatever).

However, if you ever write anything longer than about ten pages, you'll
find that a hierarchically structured document is much easier to
maintain than a flat one, for a number of reasons, including automatic
section numbering and cross-referencing, and the ability to easily move
sections around.

DocBook also supports many features that *roff doesn't, such as indexes,
glossaries and bibliographies.  Also, while *roff may require less
typing, you have to admit that DocBook is a lot easier to *read*.

The main drawback to DocBook is that most of the available tools aren't
very good.  There are a number of free and commercial editors of varying
quality (some very basic, some quite good).  There are also a number of
free and commercial processing tools, but the free ones (including the
official DocBook-XSL stylesheets) suck.  There is at least one very good
commercial toolchain (Prince XML) with a free-as-in-beer "personal
edition".  There also seems to be a new free (BSD-licensed, actually)
toolchain called xmlroff, but I haven't tried it, so I don't know how
good it is.

I have my own DocBook-to-XHTML stylesheets, but they only cover the
subset of DocBook that I use myself, and as they make heavy use of CSS,
they are at the mercy of browser bugs.  They are, however, very fast.

Note that there are parts of DocBook (for instance, the CALS table
model) that can't be implemented correctly using only stylesheets.

Finally, a word about LaTeX.  Yes, it takes a lot of space, and it has
its (numerous) quirks.  Ironically (in the context of this discussion),
it is often criticized for *not* being hierarchical.  However, no other
document preparation system can beat the quality of its output - not
even close.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86ocxs9liv.fsf>