Date: Sat, 13 Oct 2012 04:40:23 +0200 From: "C. P. Ghost" <cpghost@cordula.ws> To: Gary Kline <kline@thought.org> Cc: freebsd-questions@freebsd.org Subject: Re: editing pdf files Message-ID: <CADGWnjU6xTEXFBS7v1hqLg15OSN=deop=fH5A28dbDmxLLYiXg@mail.gmail.com> In-Reply-To: <20121012234628.GA11112@ethic.thought.org> References: <5074A6B9.8040209@dreamchaser.org> <5078641D.4050905@passap.ru> <20121012234628.GA11112@ethic.thought.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Oct 13, 2012 at 1:46 AM, Gary Kline <kline@thought.org> wrote: > On Fri, Oct 12, 2012 at 10:40:29PM +0400, Boris Samorodov wrote: >> 10.10.2012 02:35, Gary Aitken =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> >> > Can someone give me advice on editing pdf files? >> >> Take a look at graphics/inkscape. >> >> -- >> WBR, Boris Samorodov (bsam) >> FreeBSD Committer, http://www.FreeBSD.org The Power To Serve > > > ive got a question that fits in here. hopefully. > > last week I found a book from 1901 that google had scanned and l= isted > as a pdf file. it was text plus photos of the rich/famous of the > 1800s. somehow, google found the exact string that matched my gr= eat > grandfather [from the civil war]. I d'loaded the file (maybe 2mb= ytes) > and searched using acroread. nada. I used the pdftotext utility= . > same: nothing but some 600 page numbers. > > my guess is that google just took photos of the book and used oth= er > tools to create a pdf file. I am not =3Dthat=3D serious about g= enealogy, > but I would like to know if there are any tools to edit this kind= of > pdf file. I suspect the following: they scanned the book and put all the images into the PDF. The PDF itself is merely a container for scanned pages; it thus contains no text (save for the page numbers). That Google was able to search in this file is probably due to them running some OCR program on the image files, and then indexing the (approximate) text that the OCR program generated. Probably they used something like tesseract-ocr from ports graphics/tesseract: http://code.google.com/p/tesseract-ocr/ > tia guys, > > gary > > > -- > Gary Kline kline@thought.org http://www.thought.org Public Service Un= ix > Twenty-six years of service to the Unix community. -cpghost. --=20 Cordula's Web. http://www.cordula.ws/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADGWnjU6xTEXFBS7v1hqLg15OSN=deop=fH5A28dbDmxLLYiXg>