From owner-freebsd-questions@FreeBSD.ORG Sat Oct 13 11:19:16 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1FE56564 for ; Sat, 13 Oct 2012 11:19:16 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx01.qsc.de (mx01.qsc.de [213.148.129.14]) by mx1.freebsd.org (Postfix) with ESMTP id D10CA8FC08 for ; Sat, 13 Oct 2012 11:19:15 +0000 (UTC) Received: from r56.edvax.de (port-92-195-110-131.dynamic.qsc.de [92.195.110.131]) by mx01.qsc.de (Postfix) with ESMTP id D66B03CF67; Sat, 13 Oct 2012 13:19:07 +0200 (CEST) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id q9DBJ7we001927; Sat, 13 Oct 2012 13:19:07 +0200 (CEST) (envelope-from freebsd@edvax.de) Date: Sat, 13 Oct 2012 13:19:07 +0200 From: Polytropon To: Gary Kline Subject: Re: editing pdf files Message-Id: <20121013131907.c666bfc2.freebsd@edvax.de> In-Reply-To: <20121012234628.GA11112@ethic.thought.org> References: <5074A6B9.8040209@dreamchaser.org> <5078641D.4050905@passap.ru> <20121012234628.GA11112@ethic.thought.org> Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: FreeBSD Mailing List X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Polytropon List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Oct 2012 11:19:16 -0000 On Fri, 12 Oct 2012 16:46:28 -0700, Gary Kline wrote: > ive got a question that fits in here. hopefully. > > last week I found a book from 1901 that google had scanned and listed > as a pdf file. it was text plus photos of the rich/famous of the > 1800s. somehow, google found the exact string that matched my great > grandfather [from the civil war]. I d'loaded the file (maybe 2mbytes) > and searched using acroread. nada. I used the pdftotext utility. > same: nothing but some 600 page numbers. > > my guess is that google just took photos of the book and used other > tools to create a pdf file. I am not =that= serious about genealogy, > but I would like to know if there are any tools to edit this kind of > pdf file. In case the PDF is nothing more than a compilation of images, there's a way to deal with it for editing: step 1: disassemble step 2: edit images step 3: reassemble The disassembling can be done with % pdfimages source.pdf . Then the files can be edited whatever tool you like, e. g. Gimp. They often come out in PBM format. Finally the images can be re-converted to PDF and combined to one PDF file: for IMG in .*.pbm; do convert ${IMG} ${IMG}.pdf done pdftk .*.pdf output target.pdf Note the ".*" prefix for the file specification: The images extracted by pdfimages match that pattern (at least in the case I tested it for). If they get other names than .0000001.pbm, change the approach accordingly. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...