From owner-freebsd-doc Tue Jan 16 15:38:57 2001 Delivered-To: freebsd-doc@freebsd.org Received: from cfcl.com (cpe-024-221-169-054.ca.sprintbbd.net [24.221.169.54]) by hub.freebsd.org (Postfix) with ESMTP id 4E8F837B400 for ; Tue, 16 Jan 2001 15:38:39 -0800 (PST) Received: from [192.168.168.205] (cerberus [192.168.168.205]) by cfcl.com (8.9.3/8.9.3) with ESMTP id PAA88677 for ; Tue, 16 Jan 2001 15:41:09 -0800 (PST) (envelope-from rdm@cfcl.com) Mime-Version: 1.0 Message-Id: In-Reply-To: <20010116182434.A7327@canyon.nothing-going-on.org> References: <20010116172751.A3414@canyon.nothing-going-on.org> <20010116095547.A13543@Odin.AC.HMC.Edu> <20010116182434.A7327@canyon.nothing-going-on.org> Date: Tue, 16 Jan 2001 15:36:54 -0800 To: freebsd-doc@FreeBSD.ORG From: Rich Morin Subject: Re: man, TOC, xml... Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-freebsd-doc@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org The good news about going to a markup language such as SGML or XML is that the process can turn "documents" into "data structures". This can be used to facilitate indexing, checks for completeness or errors, etc. The bad news, of course, is that the effort involved in turning a man page into high-quality SGML is substantial. Semantic mark-up is tricky; humans have a hard time with it and I don't know of any programs that do it yet. In short, it requires Real Work (TM). If the user community can be inspired to help out (e.g., with a crit- or Wiki-like system), many kinds of errors and omissions could be detected. For example, when a user finds that a man page is missing a SEE ALSO reference, s/he should find it natural to report the bug. I believe that the current PR system, while functional, is less than optimal in this respect. If I am using a man page, I shouldn't have to jump through a bunch of hoops just to say that a particular para- graph needs work. In any event, we should expect to see quite a bit of OML (Ostensible Markup Language :) being generated and used in any man page conversion effort. OML looks like XML at first glance, but its data structures do not convey all of the semantic information they "should". OML for a man page, for example, might pick up the low-hanging fruit (e.g., "SEE ALSO" and "FILES"), but refrain from categorizing every keyword in the document. That said, there is quite a bit that can (and should) be done to automate the semantic mark-up process. I have heard, for instance, of an experiment in which a replacement set of troff macros was used to convert man pages (in mandoc format) into XML. Once a document is available in a structured form, it becomes easier to do graph analysis and other checks. I am currently experimenting with some of this, as part of my prototyping efforts for the Meta Project (http://www.cfcl.com/rdm/Meta). An interesting question, which I have raised offline, is whether the current PR system is well equipped to handle LARGE numbers of bug reports. An automated analysis of the man page cross-references, for instance, could generate hundreds or even thousands of PRs (consider the implications of a 10% "hit rate"). Generating the patch files for the affected man pages is a lot harder- looking (to me, at least) than simply reporting an error or omission. OTOH, if the man pages were actually stored as XML, the changes should be easier for an automated system to generate than they are now... -r -- -- http://www.cfcl.com/rdm email: rdm@cfcl.com phone: +1 650-873-7841 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message