From owner-freebsd-questions@FreeBSD.ORG  Mon Jan 26 22:51:19 2009
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0B2931065672
	for <freebsd-questions@freebsd.org>;
	Mon, 26 Jan 2009 22:51:19 +0000 (UTC)
	(envelope-from kline@thought.org)
Received: from aristotle.thought.org (aristotle.thought.org [209.180.213.210])
	by mx1.freebsd.org (Postfix) with ESMTP id B80E38FC18
	for <freebsd-questions@freebsd.org>;
	Mon, 26 Jan 2009 22:51:18 +0000 (UTC)
	(envelope-from kline@thought.org)
Received: from thought.org (tao.thought.org [10.47.0.250])
	(authenticated bits=0)
	by aristotle.thought.org (8.14.2/8.14.2) with ESMTP id n0QMpxpv082217
	for <freebsd-questions@freebsd.org>;
	Mon, 26 Jan 2009 14:51:59 -0800 (PST)
	(envelope-from kline@thought.org)
Received: by thought.org (nbSMTP-1.00) for uid 1002
	kline@thought.org; Mon, 26 Jan 2009 14:51:14 -0800 (PST)
Date: Mon, 26 Jan 2009 14:51:14 -0800
From: Gary Kline <kline@thought.org>
To: FreeBSD Mailing List <freebsd-questions@freebsd.org>
Message-ID: <20090126225113.GA78416@thought.org>
References: <20090126001822.GA38314@thought.org>
	<20090126005156.GJ66858@comcast.net> <497D0FF3.6090402@telenix.org>
	<20090126080618.GA51983@thought.org>
	<20090126091623.a0b50f64.freebsd@edvax.de>
	<20090126213648.GL66858@comcast.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20090126213648.GL66858@comcast.net>
User-Agent: Mutt/1.4.2.3i
X-Organization: Thought Unlimited. Public service Unix since 1986.
X-Of_Interest: With 22 years  of service to the Unix community.
X-Spam-Status: No, score=-4.4 required=3.6 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.2.3
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on
	aristotle.thought.org
Subject: Re: can i split a pdf file?
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jan 2009 22:51:19 -0000

On Mon, Jan 26, 2009 at 01:36:48PM -0800, Charlie Kester wrote:
> On Mon 26 Jan 2009 at 00:16:23 PST Polytropon wrote:
> >On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline <kline@thought.org> wrote:
> >>	Thanks, Gents,
> >>
> >>	But according to one smallish pdf file that I send to a web based
> >>	tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
> >>	speech program) couldn't decode it.
> >
> >This is a typical problem with "poorly engineered" PDFs where the
> >author puts in the text as images (you'll see this stupidity across
> >the Web, too).
> 
> In most cases where I've seen this, it's because they had scanned an
> actual printed document.  Many old, out-of-print books are being made
> newly available this way, so I'm not inclined to complain.
> 
> Unfortunately, OCR software still isn't reliable enough (or, if
> reliable, cheap enough) to convert these scanned images to actual text.


	You're probably right about the cost/performance idea.  Still,
	before I get back to the Last few pages of my thesis, maybe I'll
	try feeding parts of my most vanilla image-PDF file to an
	opensource OCR program.  I'm pretty sure there are a couple in
	ports.  IIRC, though, the images have to be jpegs of tiffs or the
	like.  If anybody knows, please give me a shout out!

	gary

-- 
 Gary Kline  kline@thought.org  http://www.thought.org  Public Service Unix
        http://jottings.thought.org   http://transfinite.thought.org
    The 2.23a release of Jottings: http://jottings.thought.org/index.php