Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 4 Nov 2007 05:55:38 +0100
From:      cpghost <cpghost@cordula.ws>
To:        Gary Kline <kline@tao.thought.org>
Cc:        Gary Kline <kline@tao.thought.org>, freebsd-questions@freebsd.org
Subject:   Re: pdf edit again.
Message-ID:  <20071104055538.5e2f46e9@epia-2.farid-hajji.net>
In-Reply-To: <20071104015453.GA64050@thought.org>
References:  <20071104003851.GA98655@thought.org> <20071104023914.3fabd2e7@epia-2.farid-hajji.net> <20071104015453.GA64050@thought.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 3 Nov 2007 17:54:53 -0800
Gary Kline <kline@tao.thought.org> wrote:

> On Sun, Nov 04, 2007 at 02:39:14AM +0100, cpghost wrote:
> > On Sat, 3 Nov 2007 16:38:55 -0800
> > Gary Kline <kline@tao.thought.org> wrote:
> > 
> > > 	A couple weeks ago I skimmed thru the postings on editing
> > > PDF files.  Wasn't entirely clear what the answer it because I
> > > never thought I would need to edit a GUI file.  I just found a
> > > book from 1883 in pdf format.  I would like a
> > > text/ASCII/ISO_8859-1 version.  Tried pfdtotext, but it doesn't
> > > work.   Nutshell: is there something I can use  to edit/look-at
> > > this book and get rid of whateveriit is that's causing pdftotext
> > > to fail.  (sorry for the grammar.... )
> > 
> > Old books in PDF are normally scanned bitmaps. There are no
> > characters or whatever therein; just pixels (EPS files). If you
> > want to convert that to ASCII, you'd need to extract the EPS files
> > (use something like pdfimages from the xpdf port), turn them into
> > some bitmap format, and run some kind of OCR software on that. It's
> > a slow, unreliable, error-prone and painful process though.
> > 
> > Good luck!
> 
> 
> 	"Arrrgh" (Charlie Brown).  If it's that tortured, I'll forget
> 	it; thanks for the clue.  Pretty sure this *was* just phot'd
> and scanned in.
> 
> 	(Much be how amazon.com has thir zillions of boooks online.
> 	OCR'ing is serious work; I know that first hand.)

If you need help on imperfectly OCR'ed texts, esp. on texts that
are no longer copyrighted, there's always Distributed Proofreaders
from the venerable Project Gutenberg: http://www.pgdp.net/

Good luck!
-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071104055538.5e2f46e9>