Date: Mon, 26 Jan 2009 14:51:14 -0800 From: Gary Kline <kline@thought.org> To: FreeBSD Mailing List <freebsd-questions@freebsd.org> Subject: Re: can i split a pdf file? Message-ID: <20090126225113.GA78416@thought.org> In-Reply-To: <20090126213648.GL66858@comcast.net> References: <20090126001822.GA38314@thought.org> <20090126005156.GJ66858@comcast.net> <497D0FF3.6090402@telenix.org> <20090126080618.GA51983@thought.org> <20090126091623.a0b50f64.freebsd@edvax.de> <20090126213648.GL66858@comcast.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 26, 2009 at 01:36:48PM -0800, Charlie Kester wrote: > On Mon 26 Jan 2009 at 00:16:23 PST Polytropon wrote: > >On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline <kline@thought.org> wrote: > >> Thanks, Gents, > >> > >> But according to one smallish pdf file that I send to a web based > >> tool, it was not a real pdf. Or, more accurately, it (the pdf to > >> speech program) couldn't decode it. > > > >This is a typical problem with "poorly engineered" PDFs where the > >author puts in the text as images (you'll see this stupidity across > >the Web, too). > > In most cases where I've seen this, it's because they had scanned an > actual printed document. Many old, out-of-print books are being made > newly available this way, so I'm not inclined to complain. > > Unfortunately, OCR software still isn't reliable enough (or, if > reliable, cheap enough) to convert these scanned images to actual text. You're probably right about the cost/performance idea. Still, before I get back to the Last few pages of my thesis, maybe I'll try feeding parts of my most vanilla image-PDF file to an opensource OCR program. I'm pretty sure there are a couple in ports. IIRC, though, the images have to be jpegs of tiffs or the like. If anybody knows, please give me a shout out! gary -- Gary Kline kline@thought.org http://www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org The 2.23a release of Jottings: http://jottings.thought.org/index.php
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090126225113.GA78416>