Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Dec 2008 20:23:09 -0500
From:      Robert Huff <roberthuff@rcn.com>
To:        FreeBSD Mailing List <freebsd-questions@freebsd.org>
Subject:   Re: any way to turn a pdf file into something OCR-able?
Message-ID:  <18740.36349.523718.591189@jerusalem.litteratus.org>
In-Reply-To: <20081202010730.GA15970@slackbox.xs4all.nl>
References:  <20081201231440.GA30682@thought.org> <20081202010730.GA15970@slackbox.xs4all.nl>

next in thread | previous in thread | raw e-mail | index | archive | help

Roland Smith writes:

>  > 	pdftotext fail on the large [32MB] file I've got.  Is there any
>  > 	other way I can translate this huge textfile to ascii or html or
>  > 	text?
>  

>  Please define "fail" in this context? I've used pdftotxt on
>  documents exceeding 40MB. However there are of course things that
>  don't work;
>  
>  1) Some PDFs are just wrappers around JPEG images. In this case
>  there is no text for pdftotext to convert => epic fail.

	In this case "convert" from the ImageMagick port will get you a
series of .jpg/.gif/.<whatever>.  Read the manual carefully before
attempting; also note this can be a slow process.


			Robert Huff





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?18740.36349.523718.591189>