From owner-freebsd-doc Fri Jan 17 09:02:21 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id JAA14227 for doc-outgoing; Fri, 17 Jan 1997 09:02:21 -0800 (PST) Received: from fallout.campusview.indiana.edu (fallout.campusview.indiana.edu [149.159.1.1]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id JAA14222 for ; Fri, 17 Jan 1997 09:02:18 -0800 (PST) Received: from localhost (jfieber@localhost) by fallout.campusview.indiana.edu (8.8.4/8.8.4) with SMTP id MAA25992; Fri, 17 Jan 1997 12:02:11 -0500 (EST) Date: Fri, 17 Jan 1997 12:02:10 -0500 (EST) From: John Fieber Reply-To: John Fieber To: Charles Owens cc: doc@freebsd.org Subject: Re: Newbie looking for flexible doc system In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-doc@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 16 Jan 1997, Charles Owens wrote: > I'm trying to sort out the options available to me in the SGML world and > was hoping someone here could shed some light. > > What I need is a system that will allow, without too much pain, the user > to produce output in multiple formats (print and HTML, mostly) from a > single set of source files. > > I'm _almost_ happy with the LyX -> linuxdoc approach except that from what > I can tell the linuxdoc DTD doesn't support the inclusion of images, which > is a must for my application. I've begun to investigate the Docbook DTD, > but the whole SGML thing is so huge... a bit overwhelming. > > I'd appreciate any suggestions and pointers that might point in a valid > direction. SGML does take a lot of chewing before it is digestible. It is often misunderstood misunderstood. The best non-technical survey of what SGML is all about is Liora Alschuler's "ABCD...SGML" (ISBN 1-850-32197-3). It contains the historical and background and many case studies of SGML applications. For someone raised with a computer science world view, many things in SGML can be puzzling until you understand the motivations behind the design. Unfortunately, few SGML books describe the context of SGML very well. A more technical book than "ABCD...SGML" that does address context is "Developing SGML DTDs" by Eve Maler and Jeanne El Andaloussi. Eve Maler is the architect of the current Docbook DTD. More concretely for your situation, if you conclude SGML is valuable, Docbook is a much better route than Linuxdoc in the long run, assuming you are dealing with computer related documents, or at least technical documents. If, on the other hand, having SGML in your document chain is not that important, you may be better off with LaTeX. There is at least one pretty good LaTeX to HTML converter. However, as a testament to the utility of using SGML, it only took me an afternoon to hack together a decent Linuxdoc to Docbook conversion. A LaTeX to Docbook would have been considerably more difficult since I would have had to write a parser. A great strength of SGML is that you only have to write a parser once, and all applications can use it. There is definitely a "Some Assembly Required" qualification to using suggesting the use of Docbook and the pickings for freely available tools are pretty thin at the moment. There are at least two excellent SGML parsers available. One, sgmls, is used in FreeBSD. It is fairly compact and quick, but no longer being developed by the author. The second, SP, takes the form of a C++ class library and comes with a couple command line applications, one of which duplicates the functionality of sgmls. SP has very comprehensive support of the SGML standard. If I recall correctly, the only SGML feature it doesn't support is CONCUR, which has dubious utility anyway. It also supports 16 bit characters with EUC, JIS and UTF-8 encoding for input and output. SP is rather huge though. The shared library tips the scales at 1.5 megabytes! Of course, parsing is just the beginning. You have to *do* something with the parsed document and this is where the tools thin out rapidly. The up and coming tool for formatting is Jade, which uses the SP parser. Jade implements a bunch of the DSSSL standard which provides a powerful scheme (as in the programming language) derived interface for manipulating the document to generate what is called a "flow object tree", or in plain English, a series of objects expressed in terms that page layout software can understand--boxes and lines, and containers of text to be typeset. A backend processor turns the flow object tree into a specific layout language. The best backend so far generates RTF. The HTML output is not particularly useful (yet). There is skeletal support for TeX--the backend just outputs macros calls representing objects in the flow object tree. Someone TeX wizard needs to actually write the macros. The tool I'm using for FreeBSD is called instant (for manipulating SGML document INSTANces). Instant was sort of developed by the OSF. I say "sort of" because there are a lot of limitations and bugs. It works well for relatively simple DTD such as Linuxdoc, but as the DTD complexity increases becomes a headache of non-trivial proportions. My Docbook to HTML conversion is hitting the limitations pretty hard at this point Fortunately I got a pretty usable subset of the DTD handled before the headache started getting bad! Even in the presence of DSSSL, an instant-like tool is very useful. DSSSL is, for example, not appropriate for converting between two DTDs, or doing other arbitrary document manipulations. As such, I have been pondering a re-write of it, but given other responsibilities, I don't see that happening any time soon. My Docbook to HTML conversion is in FreeBSD-current. See http://fallout.campusview.indiana.edu/~jfieber/docbook for examples of the output and instructions on how to use it. COST is a tcl based general purpose SGML manipulation tool worthy of investigation. Generally, most tools that exist can be found at http://www.sil.org/sgml/sgml.html. Finally, I should mention that the Linuxdoc DTD does support images, but but the support must be carried all the way through to the end product to be of use. In FreeBSD, the Linuxdoc to LaTeX conversion supports inclusion of encapsulated postscript, assuming dvips is used to process TeX's output. It would be trivial to support encapsulated postscript in the Linuxdoc to groff conversion as well. I have implemented similar support for another DTD I use [ISO12083]. Note that the Linuxdoc to latex conversion is currently broken in other ways and I am more inclined to drop support for that conversion than fix it, since the Linuxdoc to groff now works quite well. -john