From owner-freebsd-questions@FreeBSD.ORG  Sat Oct 13 11:19:16 2012
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 1FE56564
 for <freebsd-questions@freebsd.org>; Sat, 13 Oct 2012 11:19:16 +0000 (UTC)
 (envelope-from freebsd@edvax.de)
Received: from mx01.qsc.de (mx01.qsc.de [213.148.129.14])
 by mx1.freebsd.org (Postfix) with ESMTP id D10CA8FC08
 for <freebsd-questions@freebsd.org>; Sat, 13 Oct 2012 11:19:15 +0000 (UTC)
Received: from r56.edvax.de (port-92-195-110-131.dynamic.qsc.de
 [92.195.110.131]) by mx01.qsc.de (Postfix) with ESMTP id D66B03CF67;
 Sat, 13 Oct 2012 13:19:07 +0200 (CEST)
Received: from r56.edvax.de (localhost [127.0.0.1])
 by r56.edvax.de (8.14.5/8.14.5) with SMTP id q9DBJ7we001927;
 Sat, 13 Oct 2012 13:19:07 +0200 (CEST)
 (envelope-from freebsd@edvax.de)
Date: Sat, 13 Oct 2012 13:19:07 +0200
From: Polytropon <freebsd@edvax.de>
To: Gary Kline <kline@thought.org>
Subject: Re: editing pdf files
Message-Id: <20121013131907.c666bfc2.freebsd@edvax.de>
In-Reply-To: <20121012234628.GA11112@ethic.thought.org>
References: <5074A6B9.8040209@dreamchaser.org> <5078641D.4050905@passap.ru>
 <20121012234628.GA11112@ethic.thought.org>
Organization: EDVAX
X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: FreeBSD Mailing List <freebsd-questions@freebsd.org>
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Polytropon <freebsd@edvax.de>
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 11:19:16 -0000

On Fri, 12 Oct 2012 16:46:28 -0700, Gary Kline wrote:
> 	ive got a question that fits in here.  hopefully.
> 
> 	last week  I found a book from 1901 that google had scanned and listed
> 	as a pdf file.  it was text plus photos of the rich/famous of the 
> 	1800s.  somehow, google found the exact string that matched my great
> 	grandfather [from the civil war].  I d'loaded the file (maybe 2mbytes)
> 	and searched using acroread.  nada.  I used the pdftotext utility.
> 	same: nothing but  some 600 page numbers.
> 
> 	my guess is that google just took photos of the book and used other
> 	tools to create a pdf file.  I am not =that= serious  about genealogy,
> 	but I would like to know if there are any tools to edit this kind of
> 	pdf file.

In case the PDF is nothing more than a compilation of images,
there's a way to deal with it for editing:

step 1: disassemble
step 2: edit images
step 3: reassemble

The disassembling can be done with 

	% pdfimages source.pdf .

Then the files can be edited whatever tool you like, e. g. Gimp.
They often come out in PBM format.

Finally the images can be re-converted to PDF and combined to one
PDF file:

	for IMG in .*.pbm; do
		convert ${IMG} ${IMG}.pdf
	done
	pdftk .*.pdf output target.pdf

Note the ".*" prefix for the file specification: The images extracted
by pdfimages match that pattern (at least in the case I tested it for).
If they get other names than .0000001.pbm, change the approach
accordingly.


-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...