Date: Thu, 15 Jul 2004 03:04:34 +0000 From: Murray Stokely <murray@FreeBSD.org> To: doc@freebsd.org Subject: Spellchecking the DocBook FreeBSD Documentation Message-ID: <20040715030434.GE34256@hub.freebsd.org>
next in thread | raw e-mail | index | archive | help
This is a note to explain how I use the tools in CVS to spellcheck our documentation. This is just one of many possible approaches. $ cd /usr/doc/en_US.ISO8859-1/books/handbook $ make clean; make SPELLCHECK=1 spellcheck|sort|uniq|less The SPELLCHECK=1 variable tells the makefile to use the special spellcheck.dsl stylesheet to omit the contents of certain tags (such as <filename>) from the HTML output. This variable should just be removed and automatically added to the spellcheck target. The spellcheck target then runs ispell over the generated HTML files, using the FreeBSD technical lexicon dictionary in /usr/share/dict/freebsd. All mispelled words are printed to standard out, so we should run this through sort and then uniq to remove duplicates. Once a word is found that is not a false positive, then a quick grep through the source will tell you which file has the offending misspelling. This approach converts the DocBook into HTML before spellchecking. Another approach would be to spellcheck the DocBook source directly, examining the tags and deciding what to ignore. This could be done with an SGML aware spellchecker (aspell doesn't seem nearly powerful enough to me), or with a scripting language and SGML parsing libraries. I think that would be more work than using DSSSL or XSL for the parsing though. Chern Lee wrote and posted a TCL script here several years ago, and his script may give better output than the 'make spellcheck' stuff I've implemented above. If it does, we should add it to CVS. - Murray
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040715030434.GE34256>