From owner-freebsd-questions@FreeBSD.ORG Sun Nov 4 05:03:21 2007 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C463816A46D for ; Sun, 4 Nov 2007 05:03:21 +0000 (UTC) (envelope-from cpghost@cordula.ws) Received: from fw.farid-hajji.net (fw.farid-hajji.net [213.146.115.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5CD6F13C4B9 for ; Sun, 4 Nov 2007 05:03:20 +0000 (UTC) (envelope-from cpghost@cordula.ws) Received: from epia-2.farid-hajji.net (epia-2 [192.168.254.11]) by fw.farid-hajji.net (Postfix) with ESMTP id 52DD5E04C1; Sun, 4 Nov 2007 05:55:41 +0100 (CET) Date: Sun, 4 Nov 2007 05:55:38 +0100 From: cpghost To: Gary Kline Message-ID: <20071104055538.5e2f46e9@epia-2.farid-hajji.net> In-Reply-To: <20071104015453.GA64050@thought.org> References: <20071104003851.GA98655@thought.org> <20071104023914.3fabd2e7@epia-2.farid-hajji.net> <20071104015453.GA64050@thought.org> Organization: Cordula's Web X-Mailer: Claws Mail 3.0.2 (GTK+ 2.12.1; i386-portbld-freebsd6.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Gary Kline , freebsd-questions@freebsd.org Subject: Re: pdf edit again. X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Nov 2007 05:03:21 -0000 On Sat, 3 Nov 2007 17:54:53 -0800 Gary Kline wrote: > On Sun, Nov 04, 2007 at 02:39:14AM +0100, cpghost wrote: > > On Sat, 3 Nov 2007 16:38:55 -0800 > > Gary Kline wrote: > > > > > A couple weeks ago I skimmed thru the postings on editing > > > PDF files. Wasn't entirely clear what the answer it because I > > > never thought I would need to edit a GUI file. I just found a > > > book from 1883 in pdf format. I would like a > > > text/ASCII/ISO_8859-1 version. Tried pfdtotext, but it doesn't > > > work. Nutshell: is there something I can use to edit/look-at > > > this book and get rid of whateveriit is that's causing pdftotext > > > to fail. (sorry for the grammar.... ) > > > > Old books in PDF are normally scanned bitmaps. There are no > > characters or whatever therein; just pixels (EPS files). If you > > want to convert that to ASCII, you'd need to extract the EPS files > > (use something like pdfimages from the xpdf port), turn them into > > some bitmap format, and run some kind of OCR software on that. It's > > a slow, unreliable, error-prone and painful process though. > > > > Good luck! > > > "Arrrgh" (Charlie Brown). If it's that tortured, I'll forget > it; thanks for the clue. Pretty sure this *was* just phot'd > and scanned in. > > (Much be how amazon.com has thir zillions of boooks online. > OCR'ing is serious work; I know that first hand.) If you need help on imperfectly OCR'ed texts, esp. on texts that are no longer copyrighted, there's always Distributed Proofreaders from the venerable Project Gutenberg: http://www.pgdp.net/ Good luck! -cpghost. -- Cordula's Web. http://www.cordula.ws/