From owner-freebsd-questions@FreeBSD.ORG Mon Jan 26 22:51:19 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B2931065672 for ; Mon, 26 Jan 2009 22:51:19 +0000 (UTC) (envelope-from kline@thought.org) Received: from aristotle.thought.org (aristotle.thought.org [209.180.213.210]) by mx1.freebsd.org (Postfix) with ESMTP id B80E38FC18 for ; Mon, 26 Jan 2009 22:51:18 +0000 (UTC) (envelope-from kline@thought.org) Received: from thought.org (tao.thought.org [10.47.0.250]) (authenticated bits=0) by aristotle.thought.org (8.14.2/8.14.2) with ESMTP id n0QMpxpv082217 for ; Mon, 26 Jan 2009 14:51:59 -0800 (PST) (envelope-from kline@thought.org) Received: by thought.org (nbSMTP-1.00) for uid 1002 kline@thought.org; Mon, 26 Jan 2009 14:51:14 -0800 (PST) Date: Mon, 26 Jan 2009 14:51:14 -0800 From: Gary Kline To: FreeBSD Mailing List Message-ID: <20090126225113.GA78416@thought.org> References: <20090126001822.GA38314@thought.org> <20090126005156.GJ66858@comcast.net> <497D0FF3.6090402@telenix.org> <20090126080618.GA51983@thought.org> <20090126091623.a0b50f64.freebsd@edvax.de> <20090126213648.GL66858@comcast.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090126213648.GL66858@comcast.net> User-Agent: Mutt/1.4.2.3i X-Organization: Thought Unlimited. Public service Unix since 1986. X-Of_Interest: With 22 years of service to the Unix community. X-Spam-Status: No, score=-4.4 required=3.6 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.2.3 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on aristotle.thought.org Subject: Re: can i split a pdf file? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jan 2009 22:51:19 -0000 On Mon, Jan 26, 2009 at 01:36:48PM -0800, Charlie Kester wrote: > On Mon 26 Jan 2009 at 00:16:23 PST Polytropon wrote: > >On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline wrote: > >> Thanks, Gents, > >> > >> But according to one smallish pdf file that I send to a web based > >> tool, it was not a real pdf. Or, more accurately, it (the pdf to > >> speech program) couldn't decode it. > > > >This is a typical problem with "poorly engineered" PDFs where the > >author puts in the text as images (you'll see this stupidity across > >the Web, too). > > In most cases where I've seen this, it's because they had scanned an > actual printed document. Many old, out-of-print books are being made > newly available this way, so I'm not inclined to complain. > > Unfortunately, OCR software still isn't reliable enough (or, if > reliable, cheap enough) to convert these scanned images to actual text. You're probably right about the cost/performance idea. Still, before I get back to the Last few pages of my thesis, maybe I'll try feeding parts of my most vanilla image-PDF file to an opensource OCR program. I'm pretty sure there are a couple in ports. IIRC, though, the images have to be jpegs of tiffs or the like. If anybody knows, please give me a shout out! gary -- Gary Kline kline@thought.org http://www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org The 2.23a release of Jottings: http://jottings.thought.org/index.php