Date: Wed, 28 Jan 2009 04:06:48 +0100 From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no> To: Chuck Robey <chuckr@telenix.org> Cc: Frank Shute <frank@shute.org.uk>, Murray Stokely <murray@stokely.org>, freebsd-chat@freebsd.org Subject: Re: text formatting tools. Message-ID: <86ocxs9liv.fsf@ds4.des.no> In-Reply-To: <497E216D.1060903@telenix.org> (Chuck Robey's message of "Mon, 26 Jan 2009 15:47:41 -0500") References: <497B77C7.90001@telenix.org> <2a7894eb0901241353l56be13b4s9860b9e949bc9ec2@mail.gmail.com> <20090124224237.GA96097@melon.esperance-linux.co.uk> <2a7894eb0901241449y49391f6aj6414875e8781ea4@mail.gmail.com> <497CE231.5000202@telenix.org> <2a7894eb0901251821i6e25bfd3i4c235f946d2e581b@mail.gmail.com> <497E216D.1060903@telenix.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Chuck Robey <chuckr@telenix.org> writes: > My comments were all directed towards the fact that xml is (and > enforces) a hierarchical approach to writing [...] No, it doesn't. Nor is it "gigantic", or in any way comparable to Cobol. There are so many factual errors, misconceptions, and plain ignorance in this thread (mostly from you), I don't even know where to begin. XML is nothing more or less than a very lightweight, easy-to-parse hierarchical markup language. The complete specification is about 50 pages long, and you can skip the last 30 or so and still get a good understanding of the language. There are a number of related technologies (DOM, XSL, XPath, XLink, XInclude, XML Schemas, Relax-NG etc), but you don't need to know any of them unless you plan to develop your own XML-based document preparation system - except for XInclude, which is useful for splitting up a document into multiple files; the spec is 30 pages long, but the only thing you need to know is that <xi:include href=3D"chapter01.xml"/> will cause chapter01.xml to be included at that point in the document. XML is a streamlined successor to SGML. They look very much alike, and they share a common subset, but there are constructs in each that don't exist in the other; for instance, SGML has several shortcut notations (tag minimization) which are not allowed from XML, and XML has one (empty element) that isn't in SGML. Removing tag minimization makes XML more verbose, but it increases the chance of detecting errors or corruption early. It also simplifies canonicalization, a concept which SGML lacks entirely. SGML, by the way, goes back a long time; the current specification was adopted in 1986. The first draft of the XML specification was published ten years later, in 1996. XML is not in itself a document format, but there are a number of document formats based on XML. The most widely known is XHTML, which is plain old HTML (which is SGML-based) with minor modifications to make it acceptable to an XML parser. Two other XML-based document formats you may have heard of are ODF and OOXML. Look them up on Wikipedia if you're interested in the details. DocBook was created by O'Reilly in 1991 as a markup language for books, primarily technical manuals. For many years, O'Reilly would only accept manuscripts in DocBook format. Because of this, DocBook has a number of quirks, including markup for man pages, for command line examples, and for various user interface elements, including keystroke combinations (often rendered as line drawings of key caps). DocBook was originally an SGML application; an XML version was introduced in 2001, and as of DocBook 5.0, the SGML version has been abandoned. However, most of the FreeBSD documentation (FAQ, handbook and various articles) is still in DocBook SGML; converting the source into DocBook XML should not be too hard, but converting the toolchain, templates and stylesheets is another matter entirely. XML is hierarchical, but that does not mean DocBook is, or has to be. You can write highly structured documents in DocBook, but you can also write a completely flat document, using only <para> and so-called bridge heads (free-standing headings similar to HTML's <h1>, <h2> etc.) within a top-level element (<article>, <book> or whatever). However, if you ever write anything longer than about ten pages, you'll find that a hierarchically structured document is much easier to maintain than a flat one, for a number of reasons, including automatic section numbering and cross-referencing, and the ability to easily move sections around. DocBook also supports many features that *roff doesn't, such as indexes, glossaries and bibliographies. Also, while *roff may require less typing, you have to admit that DocBook is a lot easier to *read*. The main drawback to DocBook is that most of the available tools aren't very good. There are a number of free and commercial editors of varying quality (some very basic, some quite good). There are also a number of free and commercial processing tools, but the free ones (including the official DocBook-XSL stylesheets) suck. There is at least one very good commercial toolchain (Prince XML) with a free-as-in-beer "personal edition". There also seems to be a new free (BSD-licensed, actually) toolchain called xmlroff, but I haven't tried it, so I don't know how good it is. I have my own DocBook-to-XHTML stylesheets, but they only cover the subset of DocBook that I use myself, and as they make heavy use of CSS, they are at the mercy of browser bugs. They are, however, very fast. Note that there are parts of DocBook (for instance, the CALS table model) that can't be implemented correctly using only stylesheets. Finally, a word about LaTeX. Yes, it takes a lot of space, and it has its (numerous) quirks. Ironically (in the context of this discussion), it is often criticized for *not* being hierarchical. However, no other document preparation system can beat the quality of its output - not even close. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86ocxs9liv.fsf>