Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Nov 1999 08:18:42 +0000
From:      Nik Clayton <nik@freebsd.org>
To:        Wolfram Schneider <wosch@cs.tu-berlin.de>
Cc:        doc@freebsd.org, wosch@freebsd.org
Subject:   Re: HTML to XML converter.
Message-ID:  <19991105081842.A88120@kilt.nothing-going-on.org>
In-Reply-To: <19991104182818.A9400@freno.cs.tu-berlin.de>; from Wolfram Schneider on Thu, Nov 04, 1999 at 06:28:18PM %2B0100
References:  <19991104182818.A9400@freno.cs.tu-berlin.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Nov 04, 1999 at 06:28:18PM +0100, Wolfram Schneider wrote:
> I'm seeking a HTML to XML converter.  Is this possible with the
> FreeBSD sgml tools (jade, tiny etc.)?

First off, that's not quite what you want.  You want a converter to translate
documents marked up in one DTD (HTML) to another DTD (which you haven't shown
us, but I will be described in XML).

There are three approaches you could use to do this.  If you stick with the
tools installed by the textproc/docproj port then Jade can translate files
between two DTDs.  That's how the DocBook to HTML conversion is done.

Of course, you need to describe the mapping between the two DTDs, and in
Jade you do that using a very Scheme-ish syntax.  See all the files in
$PREFIX/share/sgml/docbook/dsssl/modular/html/ for a (quite complicated)
example.

The second approach is to use a second language designed for this, called
XSLT (XML Style Language for Transformations, or somesuch).  You will
still need to write the mapping between the two DTDs, but this time you
use a much more procedural-like language (XSLT).  I haven't played with 
this much myself, and the textproc/docproj won't install one by default.
However, if you've got space to burn then look at textproc/lotusxsl (I 
don't have a ports tree to hand, so I might have got that reference wrong,
grep for 'lotus' in /usr/ports/INDEX to be sure).  The only snag with this
approach is that most of the XSL parsers (including this one) are written
in Java, so you're going to need the JDK (all 17MB of it, or thereabouts)
installed first.

The first two approaches have the advantage of being reasonably standard.
You could migrate your XSLT stylesheets between different XSLT processors
on different platforms, for example.

If that's not important to you then investigate instant, which is probably
part of textproc/sgmlformat.  This was how we used to do DocBook to HTML
conversions, and has a very simple language designed to do this and not
much else.  The snag is that the syntax is specific to instant, but it'll
be by far the simplest approach.

N
-- 
A different "distribution" of Linux is really a different operating system.  
They just refuse to call it that because it's bad press.  But that's what 
the shoe fits.
    -- Tom Christiansen, <199910211639.KAA18701@jhereg.perl.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19991105081842.A88120>