From owner-freebsd-questions@FreeBSD.ORG Fri Jan 6 14:54:30 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D427316A41F for ; Fri, 6 Jan 2006 14:54:30 +0000 (GMT) (envelope-from nalists@scls.lib.wi.us) Received: from mail.scls.lib.wi.us (mail.scls.lib.wi.us [198.150.40.25]) by mx1.FreeBSD.org (Postfix) with ESMTP id 77EF143D45 for ; Fri, 6 Jan 2006 14:54:30 +0000 (GMT) (envelope-from nalists@scls.lib.wi.us) Received: from [172.26.2.238] ([172.26.2.238]) by mail.scls.lib.wi.us (8.12.9p2/8.12.9) with ESMTP id k06EsRR4032965; Fri, 6 Jan 2006 08:54:27 -0600 (CST) (envelope-from nalists@scls.lib.wi.us) Message-ID: <43BE84A3.6050004@scls.lib.wi.us> Date: Fri, 06 Jan 2006 08:54:27 -0600 From: Greg Barniskis User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Gary Kline References: <20060106051439.GA80045@thought.org> In-Reply-To: <20060106051439.GA80045@thought.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Mailing List Subject: Re: how to tell aspell -c to ignore "_", ">", "<", and other bytes X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jan 2006 14:54:30 -0000 Gary Kline wrote: > People, > > You may remember that I'm trying to scan > 400 pages from a text. > Things work much better using he latest gocr and a greatly > enlarged JPEG image, tweaked with xv. I'm almmost to the point > where I can use aspell -c to correct misinterpreted text. The > gotcha is that the sample jpg file I have are filled with > improper non-characters, including "_", '<", ">", along with > punctuation, and random integers. Is there any way to tell > aspell to look at (say) S_wiss and guess Swiss, an6yle and guess > angle, n:otio:1 and guess motion, and di.5tnnce and guess distance? You might get somewhere with the bad-spellers suggestion mode setting, which should make it more aggressive about trying to find a match for mangled strings. However, I understand that in this mode it's still looking for soundslike mistrakes, not "9 looks like g" and the like. This mode also turns of checking for typos IIRC, but those checks really won't be helping you anyway since they're looking for fumbled keystrokes, not lookalike chars. Tuning the edit distance may or may not help for those really bad mangles. Other than that, you should probably ask this question in an aspell support forum for best results. -- Greg Barniskis, Computer Systems Integrator South Central Library System (SCLS) Library Interchange Network (LINK) , (608) 266-6348