From owner-freebsd-doc  Tue Jan 16 15:38:57 2001
Delivered-To: freebsd-doc@freebsd.org
Received: from cfcl.com (cpe-024-221-169-054.ca.sprintbbd.net [24.221.169.54])
	by hub.freebsd.org (Postfix) with ESMTP id 4E8F837B400
	for <freebsd-doc@FreeBSD.ORG>; Tue, 16 Jan 2001 15:38:39 -0800 (PST)
Received: from [192.168.168.205] (cerberus [192.168.168.205])
	by cfcl.com (8.9.3/8.9.3) with ESMTP id PAA88677
	for <freebsd-doc@FreeBSD.ORG>; Tue, 16 Jan 2001 15:41:09 -0800 (PST)
	(envelope-from rdm@cfcl.com)
Mime-Version: 1.0
Message-Id: <p05001941b68a80360470@[192.168.168.205]>
In-Reply-To: <20010116182434.A7327@canyon.nothing-going-on.org>
References: <Pine.BSF.4.30.0101161434550.25916-100000@k2.vol.cz>
 <20010116172751.A3414@canyon.nothing-going-on.org>
 <20010116095547.A13543@Odin.AC.HMC.Edu>
 <20010116182434.A7327@canyon.nothing-going-on.org>
Date: Tue, 16 Jan 2001 15:36:54 -0800
To: freebsd-doc@FreeBSD.ORG
From: Rich Morin <rdm@cfcl.com>
Subject: Re: man, TOC, xml...
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Sender: owner-freebsd-doc@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

The good news about going to a markup language such as SGML or XML
is that the process can turn "documents" into "data structures".
This can be used to facilitate indexing, checks for completeness or
errors, etc.

The bad news, of course, is that the effort involved in turning a
man page into high-quality SGML is substantial.  Semantic mark-up
is tricky; humans have a hard time with it and I don't know of any
programs that do it yet.  In short, it requires Real Work (TM).

If the user community can be inspired to help out (e.g., with a crit-
or Wiki-like system), many kinds of errors and omissions could be
detected.  For example, when a user finds that a man page is missing
a SEE ALSO reference, s/he should find it natural to report the bug.

I believe that the current PR system, while functional, is less than
optimal in this respect.  If I am using a man page, I shouldn't have
to jump through a bunch of hoops just to say that a particular para-
graph needs work.


In any event, we should expect to see quite a bit of OML (Ostensible
Markup Language :) being generated and used in any man page conversion
effort.  OML looks like XML at first glance, but its data structures
do not convey all of the semantic information they "should".  OML for
a man page, for example, might pick up the low-hanging fruit (e.g.,
"SEE ALSO" and "FILES"), but refrain from categorizing every keyword
in the document.

That said, there is quite a bit that can (and should) be done to
automate the semantic mark-up process.  I have heard, for instance,
of an experiment in which a replacement set of troff macros was used
to convert man pages (in mandoc format) into XML.


Once a document is available in a structured form, it becomes easier
to do graph analysis and other checks.  I am currently experimenting
with some of this, as part of my prototyping efforts for the Meta
Project (http://www.cfcl.com/rdm/Meta).

An interesting question, which I have raised offline, is whether the
current PR system is well equipped to handle LARGE numbers of bug
reports.  An automated analysis of the man page cross-references, for
instance, could generate hundreds or even thousands of PRs (consider
the implications of a 10% "hit rate").

Generating the patch files for the affected man pages is a lot harder-
looking (to me, at least) than simply reporting an error or omission.
OTOH, if the man pages were actually stored as XML, the changes should
be easier for an automated system to generate than they are now...

-r
-- 
--
http://www.cfcl.com/rdm
email: rdm@cfcl.com
phone: +1 650-873-7841


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message