Date: Wed, 28 Jan 2009 11:22:11 -0800 From: Gary Kline <kline@thought.org> To: Reko Turja <reko.turja@liukuma.net> Cc: FreeBSD Mailing List <freebsd-questions@FreeBSD.ORG> Subject: Re: OCR... Message-ID: <20090128192211.GB22208@thought.org> In-Reply-To: <319D789FD18042DBB7A19571DA26E5AE@rivendell> References: <20090128040802.GA94236@thought.org> <319D789FD18042DBB7A19571DA26E5AE@rivendell>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 28, 2009 at 12:08:55PM +0200, Reko Turja wrote: > >so what is the best commercial/shareware that can read a 10pt-font > >file? (( also, when i have time to get back into actually hacking, > >this [[turning imaged pdf into OCR'able ascii or 8859-1]] is giong > >to > >be a first target. any idea which team i should go with. gOCR > >looks > >best so far to me. > > AABBYY Finereader - Omnipage haven't been able to catch it in several > years either feature or qualitywise. No idea if Finereader runs under > emulator though. If the file is already a PDF and 72 DPI with text as > graphics most of the damage has already been done, and it will be > extremely hard to OCR. > well, damage is probably done. how can i check the resolution? i tried to increase it by creating huge ppm and tif files, but then that's really absurd since there can only be just so much data per image. i _could_ try xv and jpeg and smoothing image to refine, but too much hassle. (i used gocr -m 130 and "saw" the glyphs it (presumably) saw. seemed pretty much okay to my eyes. but then i'm not a computer program. [MAYBE :)] gary > -Reko > -- Gary Kline kline@thought.org http://www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org The 2.23a release of Jottings: http://jottings.thought.org/index.php
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090128192211.GB22208>
