From owner-freebsd-questions@FreeBSD.ORG Mon Jan 26 22:06:44 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 39BB5106564A for ; Mon, 26 Jan 2009 22:06:44 +0000 (UTC) (envelope-from kline@thought.org) Received: from aristotle.thought.org (ns1.thought.org [209.180.213.210]) by mx1.freebsd.org (Postfix) with ESMTP id CD2178FC08 for ; Mon, 26 Jan 2009 22:06:43 +0000 (UTC) (envelope-from kline@thought.org) Received: from thought.org (tao.thought.org [10.47.0.250]) (authenticated bits=0) by aristotle.thought.org (8.14.2/8.14.2) with ESMTP id n0QM79Xq081968; Mon, 26 Jan 2009 14:07:09 -0800 (PST) (envelope-from kline@thought.org) Received: by thought.org (nbSMTP-1.00) for uid 1002 kline@thought.org; Mon, 26 Jan 2009 14:06:23 -0800 (PST) Date: Mon, 26 Jan 2009 14:06:23 -0800 From: Gary Kline To: Polytropon Message-ID: <20090126220623.GA76673@thought.org> References: <20090126001822.GA38314@thought.org> <20090126005156.GJ66858@comcast.net> <497D0FF3.6090402@telenix.org> <20090126080618.GA51983@thought.org> <20090126091623.a0b50f64.freebsd@edvax.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090126091623.a0b50f64.freebsd@edvax.de> User-Agent: Mutt/1.4.2.3i X-Organization: Thought Unlimited. Public service Unix since 1986. X-Of_Interest: With 22 years of service to the Unix community. X-Spam-Status: No, score=-4.4 required=3.6 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.2.3 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on aristotle.thought.org Cc: Chuck Robey , FreeBSD Mailing List Subject: Re: can i split a pdf file? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jan 2009 22:06:44 -0000 On Mon, Jan 26, 2009 at 09:16:23AM +0100, Polytropon wrote: > On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline wrote: > > Thanks, Gents, > > > > But according to one smallish pdf file that I send to a web based > > tool, it was not a real pdf. Or, more accurately, it (the pdf to > > speech program) couldn't decode it. > > This is a typical problem with "poorly engineered" PDFs where the > author puts in the text as images (you'll see this stupidity across > the Web, too). So what kind of moron is going to photograph pages --or maybe just get-screenshot-of-this-page" and upload it? Or a Real question: I read an online pdf of "The Art of War" from the 1880's [?], and it was in an old-English or olden-Deutsch type font. In PDF. i have other p.d. texts in pdf and am wondering in there is some sort of scanner than can take a book-length script and create a pdf file. Anybody know? > > A good tool to check if the PDF file can be (audibly) read is the > use of the tool pdftotext from the port xpdf. > > % pdftotext bla.pdf && less bla.txt > > Then, even the FF speech plugin should work correctly - as long as > the PDF file contains decodable text. If it's just a bunch of images, > well, what are we expecting, hm? FF-speech: "You see a pretty image of > some text..." :-) > Yeah, that's about right! I got a bunch of ^L bytes and nothing else. Now I'm looking at the file with od -c and, yup, it's and image. The parts inbetween pages are in ASCII. Do you know what "MediaBox" is? At least the web article was not an image! Google had it both in PDF and HTML. gary > > > -- > Polytropon > From Magdeburg, Germany > Happy FreeBSD user since 4.0 > Andra moi ennepe, Mousa, ... -- Gary Kline kline@thought.org http://www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org The 2.23a release of Jottings: http://jottings.thought.org/index.php