Date: Sun, 4 Nov 2007 02:39:14 +0100 From: cpghost <cpghost@cordula.ws> To: freebsd-questions@freebsd.org Cc: Gary Kline <kline@tao.thought.org> Subject: Re: pdf edit again. Message-ID: <20071104023914.3fabd2e7@epia-2.farid-hajji.net> In-Reply-To: <20071104003851.GA98655@thought.org> References: <20071104003851.GA98655@thought.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 3 Nov 2007 16:38:55 -0800 Gary Kline <kline@tao.thought.org> wrote: > A couple weeks ago I skimmed thru the postings on editing PDF > files. Wasn't entirely clear what the answer it because I > never thought I would need to edit a GUI file. I just found a book > from 1883 in pdf format. I would like a text/ASCII/ISO_8859-1 > version. Tried pfdtotext, but it doesn't work. Nutshell: is > there something I can use to edit/look-at this book and get > rid of whateveriit is that's causing pdftotext to fail. (sorry for > the grammar.... ) Old books in PDF are normally scanned bitmaps. There are no characters or whatever therein; just pixels (EPS files). If you want to convert that to ASCII, you'd need to extract the EPS files (use something like pdfimages from the xpdf port), turn them into some bitmap format, and run some kind of OCR software on that. It's a slow, unreliable, error-prone and painful process though. Good luck! -cpghost. -- Cordula's Web. http://www.cordula.ws/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071104023914.3fabd2e7>