From owner-freebsd-doc@FreeBSD.ORG Tue Dec 20 23:23:02 2011 Return-Path: Delivered-To: doc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C496106566C for ; Tue, 20 Dec 2011 23:23:02 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id 0FDBF8FC17 for ; Tue, 20 Dec 2011 23:23:01 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.5/8.14.5) with ESMTP id pBKNAR4l011602 for ; Tue, 20 Dec 2011 16:10:27 -0700 (MST) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.5/8.14.5/Submit) with ESMTP id pBKNARcT011599 for ; Tue, 20 Dec 2011 16:10:27 -0700 (MST) (envelope-from wblock@wonkity.com) Date: Tue, 20 Dec 2011 16:10:27 -0700 (MST) From: Warren Block To: doc@freebsd.org Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (wonkity.com [127.0.0.1]); Tue, 20 Dec 2011 16:10:27 -0700 (MST) Cc: Subject: igor: Checking FreeBSD documents X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2011 23:23:02 -0000 The many rules for man pages and DocBook are a bit much to remember when months can pass between working on the various types. My mentors Glen Barber and Benedict Reuschling have been very good at catching this kind of thing, but repetitive tasks are what computers are supposed to do for us. After not finding much to do automated checking, I slapped together a Perl program called "igor" that does some of this. At present it checks all types of files for repeated words ("is is"), common spelling mistakes collected from FreeBSD documents, FreeBSD obsolete features (just "cvsup" so far), bad phrases ("the to"), and bad whitespace (blank lines with whitespace or lines with trailing whitespace). Oh, and there's a separate style check that makes some subjective suggestions. mdoc(7) documents are also tested for document date, sentences starting on a new line, and document structure (Dd, Dt, Os, Sh NAME, Nm, Nd, Sh SYNOPSIS, Sh DESCRIPTION occurring in the right order and with parameters). DocBook SGML documents are also tested for correct indentation (a terrible hack even when compared to the surrounding code), indentation whitespace, title capitalization, matching open and close tags, straggling tags, and long lines. Default is to run all tests, but specifying an individual test runs just that one. Output is in plain ASCII, or better yet, marked with ANSI color sequences that help to identify errors visually (usable with 'less -R'). igor handles compressed man pages, and shows filenames when run on multiple files. It also accepts input on stdin. Typical usage: igor -h igor -R -D ifconfig.8.gz | less -RS igor -R -D /usr/share/man/man1/*.gz | less -RS igor -R chapter.sgml | less -RS igor is really more of a proof-of-concept than a finished program. There are more tests that could be done, and existing tests could be done better. Still, it's useful as-is. Maybe presenting it will spur someone to point out a smarter, better, or faster way of doing these tests. Or rewrite it entirely. The current version of igor is here: http://www.wonkity.com/~wblock/igor/igor Perl 5.10 minimum, no dependencies, no port, no warranty. Please upgrade if you have an earlier version, many changes and improvements have happened in the last few days. My thanks to Glen Barber and Benedict Reuschling for their tremendous patience and help.