Date: Mon, 1 Dec 2008 20:23:09 -0500 From: Robert Huff <roberthuff@rcn.com> To: FreeBSD Mailing List <freebsd-questions@freebsd.org> Subject: Re: any way to turn a pdf file into something OCR-able? Message-ID: <18740.36349.523718.591189@jerusalem.litteratus.org> In-Reply-To: <20081202010730.GA15970@slackbox.xs4all.nl> References: <20081201231440.GA30682@thought.org> <20081202010730.GA15970@slackbox.xs4all.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
Roland Smith writes: > > pdftotext fail on the large [32MB] file I've got. Is there any > > other way I can translate this huge textfile to ascii or html or > > text? > > Please define "fail" in this context? I've used pdftotxt on > documents exceeding 40MB. However there are of course things that > don't work; > > 1) Some PDFs are just wrappers around JPEG images. In this case > there is no text for pdftotext to convert => epic fail. In this case "convert" from the ImageMagick port will get you a series of .jpg/.gif/.<whatever>. Read the manual carefully before attempting; also note this can be a slow process. Robert Huff
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?18740.36349.523718.591189>