From owner-freebsd-questions@FreeBSD.ORG Mon Jan 26 23:39:56 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1392106566C for ; Mon, 26 Jan 2009 23:39:56 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx01.qsc.de (mx01.qsc.de [213.148.129.14]) by mx1.freebsd.org (Postfix) with ESMTP id ADE068FC12 for ; Mon, 26 Jan 2009 23:39:56 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from r55.edvax.de (port-92-196-68-197.dynamic.qsc.de [92.196.68.197]) by mx01.qsc.de (Postfix) with ESMTP id E7D0F3CB7F; Tue, 27 Jan 2009 00:39:40 +0100 (CET) Received: from r55.edvax.de (localhost [127.0.0.1]) by r55.edvax.de (8.14.2/8.14.2) with SMTP id n0QNdYdb005827; Tue, 27 Jan 2009 00:39:34 +0100 (CET) (envelope-from freebsd@edvax.de) Date: Tue, 27 Jan 2009 00:39:34 +0100 From: Polytropon To: Gary Kline Message-Id: <20090127003934.3d828210.freebsd@edvax.de> In-Reply-To: <20090126220623.GA76673@thought.org> References: <20090126001822.GA38314@thought.org> <20090126005156.GJ66858@comcast.net> <497D0FF3.6090402@telenix.org> <20090126080618.GA51983@thought.org> <20090126091623.a0b50f64.freebsd@edvax.de> <20090126220623.GA76673@thought.org> Organization: EDVAX X-Mailer: Sylpheed 2.4.7 (GTK+ 2.12.1; i386-portbld-freebsd7.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Chuck Robey , FreeBSD Mailing List Subject: Re: can i split a pdf file? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Polytropon List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jan 2009 23:39:57 -0000 On Mon, 26 Jan 2009 14:06:23 -0800, Gary Kline wrote: > So what kind of moron is going to photograph pages --or maybe just > get-screenshot-of-this-page" and upload it? The PDF serves as a container for pictural images in this context. Another idea would be to have separate image files, one file per page, that you could view at with your favourite image viewer. The advantage of the PDF container is that you can easily print a bunch of pages (or, a book). > Or a Real question: > I read an online pdf of "The Art of War" from the 1880's [?], and > it was in an old-English or olden-Deutsch type font. In PDF. i > have other p.d. texts in pdf and am wondering in there is some > sort of scanner than can take a book-length script and create a > pdf file. Anybody know? It's very complicated to handle old fonts using OCR techniques. It's even quite complicated with today's standard fonts. Allthough there are (usually expensive) OCR programs with good algorithms, most documents need some work afterwards. It's not only about correcting mis-recognized characters, you have to handle hyphenation and paragraph typesetting as well. I know that there are scanners that can process a bunch op paper (sheets of paper) through an automatic feeder, then scan them and finally have a PDF file ready for FTP download. But there's no OCR involved, of course. > I got a bunch of ^L bytes and nothing > else. The Ctrl-L (^L) is the page break character (FF = form feed). The rest of the file then contains images that are not transformable into characters. > Now I'm looking at the file with od -c and, yup, it's and > image. The parts inbetween pages are in ASCII. Do you know what > "MediaBox" is? An image container maybe? So every page contains of a "MediaBox" container holding one image. > At least the web article was not an image! Don't mind, I know "important" web pages where the text content actually IS an image, and of course theres no alt= or longdesc= parameter because they're for weenies. :-) -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...