Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Jan 1997 12:02:10 -0500 (EST)
From:      John Fieber <jfieber@indiana.edu>
To:        Charles Owens <owensc@enc.edu>
Cc:        doc@freebsd.org
Subject:   Re: Newbie looking for flexible doc system
Message-ID:  <Pine.BSI.3.95.970117083851.22331F-100000@fallout.campusview.indiana.edu>
In-Reply-To: <Pine.FBS.3.93.970116150847.26364D-100000@dingo.its.enc.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 16 Jan 1997, Charles Owens wrote:

> I'm trying to sort out the options available to me in the SGML world and
> was hoping someone here could shed some light.
> 
> What I need is a system that will allow, without too much pain, the user
> to produce output in multiple formats (print and HTML, mostly) from a
> single set of source files.
> 
> I'm _almost_ happy with the LyX -> linuxdoc approach except that from what
> I can tell the linuxdoc DTD doesn't support the inclusion of images, which
> is a must for my application.  I've begun to investigate the Docbook DTD,
> but the whole SGML thing is so huge... a bit overwhelming.
> 
> I'd appreciate any suggestions and pointers that might point in a valid
> direction.

SGML does take a lot of chewing before it is digestible.  It is often
misunderstood misunderstood.  The best non-technical survey of what
SGML is all about is Liora Alschuler's "ABCD...SGML"  (ISBN
1-850-32197-3).  It contains the historical and background and many
case studies of SGML applications.  For someone raised with a
computer science world view, many things in SGML can be puzzling
until you understand the motivations behind the design. 
Unfortunately, few SGML books describe the context of SGML very
well.  A more technical book than "ABCD...SGML" that does address
context is "Developing SGML DTDs" by Eve Maler and Jeanne El
Andaloussi.  Eve Maler is the architect of the current Docbook DTD. 

More concretely for your situation, if you conclude SGML is valuable,
Docbook is a much better route than Linuxdoc in the long run,
assuming you are dealing with computer related documents, or at least
technical documents.  If, on the other hand, having SGML in your
document chain is not that important, you may be better off with
LaTeX.  There is at least one pretty good LaTeX to HTML converter.

However, as a testament to the utility of using SGML, it only took me
an afternoon to hack together a decent Linuxdoc to Docbook
conversion.  A LaTeX to Docbook would have been considerably more
difficult since I would have had to write a parser.  A great strength
of SGML is that you only have to write a parser once, and all
applications can use it.

There is definitely a "Some Assembly Required"  qualification to
using suggesting the use of Docbook and the pickings for freely
available tools are pretty thin at the moment.  There are at least
two excellent SGML parsers available. One, sgmls, is used in
FreeBSD.  It is fairly compact and quick, but no longer being
developed by the author.  The second, SP, takes the form of a C++
class library and comes with a couple command line applications, one
of which duplicates the functionality of sgmls. SP has very
comprehensive support of the SGML standard.  If I recall correctly,
the only SGML feature it doesn't support is CONCUR, which has dubious
utility anyway.  It also supports 16 bit characters with EUC, JIS and
UTF-8 encoding for input and output. SP is rather huge though.  The
shared library tips the scales at 1.5 megabytes!

Of course, parsing is just the beginning.  You have to *do* something
with the parsed document and this is where the tools thin out
rapidly.  The up and coming tool for formatting is Jade, which uses
the SP parser.  Jade implements a bunch of the DSSSL standard which
provides a powerful scheme (as in the programming language) derived
interface for manipulating the document to generate what is called a
"flow object tree", or in plain English, a series of objects
expressed in terms that page layout software can understand--boxes
and lines, and containers of text to be typeset.  A backend processor
turns the flow object tree into a specific layout language.  The best
backend so far generates RTF.  The HTML output is not particularly
useful (yet).  There is skeletal support for TeX--the backend just
outputs macros calls representing objects in the flow object tree. 
Someone TeX wizard needs to actually write the macros.

The tool I'm using for FreeBSD is called instant (for manipulating
SGML document INSTANces).  Instant was sort of developed by the OSF. 
I say "sort of" because there are a lot of limitations and bugs.  It
works well for relatively simple DTD such as Linuxdoc, but as the DTD
complexity increases becomes a headache of non-trivial proportions. 
My Docbook to HTML conversion is hitting the limitations pretty hard
at this point Fortunately I got a pretty usable subset of the DTD
handled before the headache started getting bad!

Even in the presence of DSSSL, an instant-like tool is very useful. 
DSSSL is, for example, not appropriate for converting between two
DTDs, or doing other arbitrary document manipulations.  As such, I
have been pondering a re-write of it, but given other
responsibilities, I don't see that happening any time soon. 

My Docbook to HTML conversion is in FreeBSD-current.  See
http://fallout.campusview.indiana.edu/~jfieber/docbook for examples
of the output and instructions on how to use it.

COST is a tcl based general purpose SGML manipulation tool worthy of
investigation.  Generally, most tools that exist can be found at
http://www.sil.org/sgml/sgml.html.

Finally, I should mention that the Linuxdoc DTD does support images,
but but the support must be carried all the way through to the end
product to be of use. In FreeBSD, the Linuxdoc to LaTeX conversion
supports inclusion of encapsulated postscript, assuming dvips is used
to process TeX's output.  It would be trivial to support encapsulated
postscript in the Linuxdoc to groff conversion as well. I have
implemented similar support for another DTD I use [ISO12083].  Note
that the Linuxdoc to latex conversion is currently broken in other
ways and I am more inclined to drop support for that conversion than
fix it, since the Linuxdoc to groff now works quite well.

-john




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSI.3.95.970117083851.22331F-100000>