From owner-freebsd-questions@FreeBSD.ORG Sat Oct 13 02:40:24 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B119F514 for ; Sat, 13 Oct 2012 02:40:24 +0000 (UTC) (envelope-from cpghost@cordula.ws) Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) by mx1.freebsd.org (Postfix) with ESMTP id 70FC88FC08 for ; Sat, 13 Oct 2012 02:40:24 +0000 (UTC) Received: by mail-ie0-f182.google.com with SMTP id k10so7474785iea.13 for ; Fri, 12 Oct 2012 19:40:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=xIOP5PXCEj8on+w7ssF6w3TsDvKNMl4W+yXmsoFUIjk=; b=gfs1nYDwWsgIb9ZsT+JCVC2j129Hh+HOEIjZtnghC5eTizyDoamxqVvOcLV8IPQvfx 2a1C64dJVMtEDua3RdolNP+BwbjPQttMU6UbhFYMzqZ2e9zfW7i3xOYC12BP0VrTl68g 6ytq35260UDVaK+LWd0ha/tWCghaIGePDt+RpZxImX0gl0pXJICPMIpYsJB1KnrgCpie SgQcvMxp9aEYD/zgU46LJnjDflH0+Zey08jclyqUFu62syXeBKPuTv2i6yAkaMqLDGLO N0No26jgFXOpUUB76kYV85eNgEAgC4ckNgqVbDwdx2fYxsYZyrElBx/3btJ47E8g5xEk cpzw== MIME-Version: 1.0 Received: by 10.50.57.130 with SMTP id i2mr3832639igq.56.1350096023556; Fri, 12 Oct 2012 19:40:23 -0700 (PDT) Received: by 10.64.49.67 with HTTP; Fri, 12 Oct 2012 19:40:23 -0700 (PDT) X-Originating-IP: [93.221.183.70] In-Reply-To: <20121012234628.GA11112@ethic.thought.org> References: <5074A6B9.8040209@dreamchaser.org> <5078641D.4050905@passap.ru> <20121012234628.GA11112@ethic.thought.org> Date: Sat, 13 Oct 2012 04:40:23 +0200 Message-ID: Subject: Re: editing pdf files From: "C. P. Ghost" To: Gary Kline Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQkGHChS3Y1NJ8G6RbA7Ih98X8rZ20D76bLedPafSOV3Q3W6dlIGFe6S8svS2ZO07FxKjHIf Cc: freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Oct 2012 02:40:24 -0000 On Sat, Oct 13, 2012 at 1:46 AM, Gary Kline wrote: > On Fri, Oct 12, 2012 at 10:40:29PM +0400, Boris Samorodov wrote: >> 10.10.2012 02:35, Gary Aitken =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> >> > Can someone give me advice on editing pdf files? >> >> Take a look at graphics/inkscape. >> >> -- >> WBR, Boris Samorodov (bsam) >> FreeBSD Committer, http://www.FreeBSD.org The Power To Serve > > > ive got a question that fits in here. hopefully. > > last week I found a book from 1901 that google had scanned and l= isted > as a pdf file. it was text plus photos of the rich/famous of the > 1800s. somehow, google found the exact string that matched my gr= eat > grandfather [from the civil war]. I d'loaded the file (maybe 2mb= ytes) > and searched using acroread. nada. I used the pdftotext utility= . > same: nothing but some 600 page numbers. > > my guess is that google just took photos of the book and used oth= er > tools to create a pdf file. I am not =3Dthat=3D serious about g= enealogy, > but I would like to know if there are any tools to edit this kind= of > pdf file. I suspect the following: they scanned the book and put all the images into the PDF. The PDF itself is merely a container for scanned pages; it thus contains no text (save for the page numbers). That Google was able to search in this file is probably due to them running some OCR program on the image files, and then indexing the (approximate) text that the OCR program generated. Probably they used something like tesseract-ocr from ports graphics/tesseract: http://code.google.com/p/tesseract-ocr/ > tia guys, > > gary > > > -- > Gary Kline kline@thought.org http://www.thought.org Public Service Un= ix > Twenty-six years of service to the Unix community. -cpghost. --=20 Cordula's Web. http://www.cordula.ws/