Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Feb 2004 14:26:25 -0500
From:      Chuck Swiger <cswiger@mac.com>
To:        =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= <des@des.no>
Cc:        Alex Dupre <ale@FreeBSD.org>
Subject:   Re: Validating docbook articles...
Message-ID:  <403A53E1.2040305@mac.com>
In-Reply-To: <xzpd686huyw.fsf@dwp.des.no>
References:  <8D03FA54-4BA6-11D8-8D97-003065ABFD92@pkix.net> <20040216130659.GC617@submonkey.net> <4031364A.2070708@pkix.net> <20040222181114.GB32524@graf.pompo.net> <40390248.1060104@pkix.net> <4039D0FE.3010905@FreeBSD.org> <xzpd686huyw.fsf@dwp.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help
Dag-Erling Smørgrav wrote:
> Alex Dupre <ale@FreeBSD.org> writes:
>> [ ...talking about -preserve in tidy... ]
> This reminds me of the many good reasons to convert the doc tree to
> XML.  One of these is that xmllint can both validate input files and
> clean up output files, and it does a far better job of it than tidy.

An interesting idea.  I took a quick look at converting an existing SGML 
document into XML in order to gain some idea as to the work involved.

Given an SGML prologue of:

<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
%man;
<!ENTITY % freebsd PUBLIC "-//FreeBSD//ENTITIES DocBook Miscellaneous FreeBSD 
Entities//EN">
%freebsd;
<!ENTITY % trademarks PUBLIC "-//FreeBSD//ENTITIES DocBook Trademark 
Entities//EN">
%trademarks;
]>

...from doc/en_US.ISO8859-1/articles/filtering-bridges (written by ale@, of 
course :-), it's easy to add an XML prologue-- this could be done 
automaticly-- and "make lint" works just fine with an XML declaration in 
place.  So far, so good.

How does one generate proper SystemLiterals per:

|4.2.2 External Entities
|
|[Definition: If the entity is not internal, it is an external entity,
|declared as follows:]
|
|External Entity Declaration
|
|[75]   	ExternalID	   ::=   	'SYSTEM' S SystemLiteral
|           			| 'PUBLIC' S PubidLiteral S SystemLiteral

69-sec% xmllint article.sgml
article.sgml:3: parser error : SystemLiteral " or ' expected
<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
                                                                             ^
article.sgml:3: parser error : SYSTEM or PUBLIC, the URI is missing
<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
                                                                             ^
article.sgml:4: parser error : Space required after the Public Identifier
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
                                                                              ^
article.sgml:4: parser error : SystemLiteral " or ' expected
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
                                                                              ^
article.sgml:4: parser error : SYSTEM or PUBLIC, the URI is missing
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
                                                                              ^
article.sgml:5: parser warning : PEReference: %man; not found
%man;
      ^
[ ... ]

Are these entities published via a URI, or does one need to refer to a local 
path?  Is there a tool to update (normalize?) these ENTITY declarations 
automaticly, as using "xmllint --catalogs --loaddtd" didn't seem to help?

Maybe this seems trivial, but there are several hundred SGML source files 
which would all need to be updated this way...

-- 
-Chuck



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?403A53E1.2040305>