From owner-freebsd-questions@FreeBSD.ORG Sat Oct 13 21:15:38 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EE9E6CB3 for ; Sat, 13 Oct 2012 21:15:38 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx02.qsc.de (mx02.qsc.de [213.148.130.14]) by mx1.freebsd.org (Postfix) with ESMTP id AA9228FC0A for ; Sat, 13 Oct 2012 21:15:38 +0000 (UTC) Received: from r56.edvax.de (port-92-195-110-131.dynamic.qsc.de [92.195.110.131]) by mx02.qsc.de (Postfix) with ESMTP id 4101A247B0; Sat, 13 Oct 2012 23:15:37 +0200 (CEST) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id q9DLFa1l002281; Sat, 13 Oct 2012 23:15:36 +0200 (CEST) (envelope-from freebsd@edvax.de) Date: Sat, 13 Oct 2012 23:15:36 +0200 From: Polytropon To: Gary Kline Subject: Re: editing pdf files Message-Id: <20121013231536.c703bc21.freebsd@edvax.de> In-Reply-To: <20121013204701.GE14155@ethic.thought.org> References: <5074A6B9.8040209@dreamchaser.org> <5078641D.4050905@passap.ru> <20121012234628.GA11112@ethic.thought.org> <20121013131907.c666bfc2.freebsd@edvax.de> <20121013204701.GE14155@ethic.thought.org> Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: FreeBSD Mailing List X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Polytropon List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Oct 2012 21:15:39 -0000 On Sat, 13 Oct 2012 13:47:01 -0700, Gary Kline wrote: > On Sat, Oct 13, 2012 at 01:19:07PM +0200, Polytropon wrote: > > On Fri, 12 Oct 2012 16:46:28 -0700, Gary Kline wrote: > > > > The disassembling can be done with > > > > % pdfimages source.pdf . > > > > Then the files can be edited whatever tool you like, e. g. Gimp. > > They often come out in PBM format. > > > > > A qstn I should have asked last time. this book is a history or > bio of richland county, ohio:: in type, it's like 650 or more > pages. SO: Is pdfimages going to spit of 6t50 files? as noted > in last email, only a couple of these images are of any interest Depends on what actually _is_ in the PDF file. If every page is represented as a picture, 650 pictures will be created. If it contains text _and_ images, the images will be output, if will _only_ output the images, with no real realtion to where they have been placed in the text. As suggested by the name "pdfimages" it takes the images from the PDF file. :-) The easiest way to check for possible text is to install xpdf which brings the binary "pdftotext" (if I remember correctly that this tool is in _that_ package). You can then use it like this: % pdftotext source.pdf It will create "source.txt" with all actual text (but of course without _any_ formatting except line breaks and ^L page breaks), including page numbers. But hey, it's pure ASCII text suitable for further processing. :-) Run "pdftotext" without parameters for a short summary of its parameters; "man pdftotext" is also provided. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...